Introduction

Early applications of random forests (RF) focused on regression and classification problems. Random survival forests [1] (RSF) was introduced to extend RF to the setting of right-censored survival data. Implementation of RSF follows the same general principles as RF: (a) Survival trees are grown using bootstrapped data; (b) Random feature selection is used when splitting tree nodes; (c) Trees are generally grown deeply, and (d) The survival forest ensemble is calculated by averaging terminal node statistics (TNS).

The presence of censoring is a unique feature of survival data that complicates certain aspects of implementing RSF compared to RF for regression and classification. In right-censored survival data, the observed data is (T,δ)(T,{\delta}) where TT is time and δ{\delta} is the censoring indicator. The observed time TT is defined as the minimum of the true (potentially unobserved) survival event time ToT^o and the true (potentially unobserved) censoring time CoC^o; thus T=min(To,Co)T=\min(T^o,C^o) and the actual event time might not be observed. The censoring indicator is defined as δ=I{ToCo}{\delta}=I\{T^o\le C^o\}. When δ=1{\delta}=1, an event has occurred (i.e., death has occurred) and we observe the true event time, T=ToT=T^o. Otherwise when δ=0{\delta}=0, the observation is censored and we only observe the censoring time T=CoT=C^o: thus we know that the subject has survived to time CoC^o, but not when the subject actually dies.

Hereafter we denote the data by (T1,𝐗1,δ1),,(Tn,𝐗n,δn)(T_1,\mathbf{X}_1,{\delta}_1),\ldots,(T_n,\mathbf{X}_n,{\delta}_n) where 𝐗i\mathbf{X}_i is the feature vector (covariate) for individual ii and TiT_i and δi{\delta}_i are the observed time and censoring indicators for ii. RSF trees just like RF trees are grown using resampling (for example by using the bootstrap; the default is to use .632 sampling without replacement). However for notational simplicity, we will sometimes ignore this distinction.

RSF splitting rules

The true event time being subject to censoring must be dealt with when growing a RSF tree. In particular, the splitting rule for growing the tree must specifically account for censoring. Thus, the goal is to split the tree node into left and right daughters with dissimilar event history (survival) behavior.

Log-rank splitting

The default splitting rule used by the package is the log-rank test statistic and is specified by splitrule="logrank". The log-rank test has traditionally been used for two-sample testing with survival data, but it can be used for survival splitting as a means for maximizing between-node survival differences [2–6].

To explain log-rank splitting, consider a specific tree node to be split. Without loss of generality let us assume this is the root node (top of the tree). For simplicity assume the data is not bootstrapped, thus the root node data is (T1,𝐗1,δ1),,(Tn,𝐗n,δn)(T_1,\mathbf{X}_1,{\delta}_1),\ldots,(T_n,\mathbf{X}_n,{\delta}_n). Let XX denote a specific variable (i.e., one of the coordinates of the feature vector). A proposed split using XX is of the form XcX\le c and X>cX>c (for simplicity we assume XX is nominal) and splits the node into left and right daughters, L={Xic}L=\{X_i\le c\} and R={Xi>c}R=\{X_i>c\}, respectively. Let t1<t2<<tm t_1<t_2<\cdots<t_m be the distinct death times and let dj,L,dj,Rd_{j,L},d_{j,R} and Yj,L,Yj,RY_{j,L},Y_{j,R} equal the number of deaths and individuals at risk at time tjt_j in daughter nodes L,RL,R. At risk means the number of individuals in a daughter who are alive at time tjt_j, or who have an event (death) at time tjt_j: Yj,L=#{Titj,Xic},Yj,R=#{Titj,Xi>c}. Y_{j,L}=\#\{T_i\ge t_j,\;X_i\le c\},\qquad Y_{j,R}=\#\{T_i\ge t_j,\;X_i>c\}. Define Yj=Yj,L+Yj,R,dj=dj,L+dj,R. Y_j=Y_{j,L}+Y_{j,R},\qquad d_j=d_{j,L}+d_{j,R}. The log-rank split-statistic value for the split is L(X,c)=j=1m(dj,LYj,LdjYj)j=1mYj,LYj(1Yj,LYj)(YjdjYj1)dj. L(X,c)= \frac{{\displaystyle\sum_{j=1}^m\left(d_{j,L}-Y_{j,L}\,\frac{d_j}{Y_j}\right)}} {\sqrt{{\displaystyle\sum_{j=1}^m\frac{Y_{j,L}}{Y_j}\left(1-\frac{Y_{j,L}}{Y_j}\right) \left(\frac{Y_j-d_j}{Y_j-1}\right)d_j}}}. The value |L(X,c)||L(X,c)| is a measure of node separation. The larger the value, the greater the survival difference between LL and RR, and the better the split is. The best split is determined by finding the feature X*X^* and split-value c*c^* such that |L(X*,c*)||L(X,c)||L(X^*,c^*)|\ge |L(X,c)| for all XX and cc.

Log-rank score splitting

The package also implements splitting using the log-rank score test [7]. This is specified by the option splitrule="logrankscore". To describe this rule, assume the variable XX has been ordered so that X1X2XnX_1\le X_2\le\cdots\le X_n where for simplicity we assume there are nn unique values for XX (no ties). Now compute the ``ranks’’ for each survival time TjT_j, aj=δjk=1ΓjδknΓk+1 a_j={\delta}_j-\sum_{k=1}^{\Gamma_j}\frac{{\delta}_k}{n-\Gamma_k+1} where Γk=#{t:TtTk}\Gamma_k=\#\{t:T_t\le T_k\}. The log-rank score test is defined as S(x,c)=XjcajnLanL(1nLn)sa2 S(x,c)=\frac{\sum_{X_j\le c}a_j-n_L\bar{a}} {\sqrt{{\displaystyle n_L\left(1-\frac{n_L}{n}\right)s_a^2}}} where a\bar{a} and sa2s_a^2 are the sample mean and sample variance of {aj:l=1,,n}\{a_j:l=1,\ldots,n\} and n=nL+nRn=n_L+n_R where nLn_L is the sample size of the left daughter node. Log-rank score splitting defines the measure of node separation by |S(X,c)||S(X,c)|. Maximizing this value over XX and cc yields the best split.

Randomized splitting

All models in the package including RSF allow the use of randomized splitting specified by the option nsplit. Rather than splitting the node by considering all possible split-values for a variable, instead a fixed number of randomly selected split-points c1,,c𝚗𝚜𝚙𝚕𝚒𝚝c_1,\ldots,c_{\texttt{nsplit}} are chosen [1, 8, 9]. For example, the best randomized split using log-rank splitting is the maximal value of |L(X,c1)|,,|L(X,c𝚗𝚜𝚙𝚕𝚒𝚝)|. |L(X,c_1)|,\ldots,|L(X,c_{\texttt{nsplit}})|. For each variable XX, this reduces nn split-statistic evaluations (worst case scenario) to 𝚗𝚜𝚙𝚕𝚒𝚝\texttt{nsplit} evaluations. Not only does randomized splitting greatly reduce computations, it also mitigates the well known tree bias of favoring splits on variables with a large number of split-points, such as continuous variables or factors with a large number of categorical labels [10]. Related work includes [11] who investigated extremely randomized trees. Here a single random split-point is chosen for each variable (i.e., 𝚗𝚜𝚙𝚕𝚒𝚝=1\texttt{nsplit}=1). Traditional deterministic splitting (all split values considered) is specified by nsplit = 0.

Terminal node statistics (TNS)

RSF estimates the survival function, S(t|𝐗)={To>t|𝐗}S(t|\mathbf{X})=\mathbb{P}\{T^o> t|\mathbf{X}\}, and the cumulative hazard function (CHF), H(t|𝐗)=(0,t]F(du|𝐗)S(u|𝐗),F(u|𝐗)={Tou|𝐗}. H(t|\mathbf{X}) = \int_{(0,t]}\frac{F(du|\mathbf{X})}{S(u|\mathbf{X})},\qquad F(u|\mathbf{X})=\mathbb{P}\{T^o\le u|\mathbf{X}\}. Below we describe how these two quantities are estimated using a survival tree.

In-bag (IB) estimator

Once the survival tree is grown, the ends of the tree are called the terminal nodes. The survival tree predictor is defined in terms of the predictor within each terminal nodel. Let hh be a terminal node of the tree and let t1,h<t2,h<<tm(h),h t_{1,h}<t_{2,h}<\cdots<t_{m(h),h} be the unique death times in hh and let dj,hd_{j,h} and Yj,hY_{j,h} equal the number of deaths and individuals at risk at time tj,ht_{j,h}. The CHF and survival functions for hh are estimated using the bootstrapped Nelson-Aalen and Kaplan-Meier estimators Hh(t)=tj,htdj,hYj,h,Sh(t)=tj,ht(1dj,hYj,h). H_h(t)=\sum_{t_{j,h}\leq t}\frac{d_{j,h}}{Y_{j,h}},\qquad S_h(t)=\prod_{t_{j,h}\leq t}\left(1-\frac{d_{j,h}}{Y_{j,h}}\right). The survival tree predictor is defined by assigning all cases within hh the same CHF and survival estimate. This is because the purpose of the survival tree is to partition the data into homogeneous groups (i.e., terminal nodes) of individuals with similar survival behavior. To estimate H(t|𝐗)H(t|\mathbf{X}) and S(t|𝐗)S(t|\mathbf{X}) for a given feature 𝐗\mathbf{X}, drop 𝐗\mathbf{X} down the tree. Because of the binary nature of a tree, 𝐗\mathbf{X} will fall into a unique terminal node hh. The CHF and survival estimator for 𝐗\mathbf{X} equals the Nelson-Aalen and Kaplan-Meier estimator for 𝐗\mathbf{X}’s terminal node: HIB(t|𝐗)=Hh(t),SIB(t|𝐗)=Sh(t), if 𝐗h. H^{\text{IB}}(t|\mathbf{X})=H_h(t),\quad S^{\text{IB}}(t|\mathbf{X})=S_h(t), \text{ if } \; \mathbf{X}\in h. Note that we use the notation IB because the above estimators are based on the training data, which is the IB data.

Out-of-bag (OOB) estimators

To define the OOB estimator, let Ii{0,1}I_i\in\{0,1\} indicate whether case ii is IB or OOB. Let Ii=1I_i=1 if and only if ii is OOB. Drop ii down the tree and let hh denote ii’s terminal node. The OOB tree estimators for ii is HOOB(t|𝐗i)=Hh(t),SOOB(t|𝐗i)=Sh(t), if 𝐗ih and Ii=1. H^{\text{OOB}}(t|\mathbf{X}_i)=H_h(t),\quad S^{\text{OOB}}(t|\mathbf{X}_i)=S_h(t), \text{ if } \; \mathbf{X}_i\in h \text{ and } I_i=1. Observe these are NULL unless ii is OOB for the tree.

Ensemble CHF and Survival Function

The ensemble CHF and survival function are determined by averaging the tree estimator. Let HbIB(t|𝐗)H^{\text{IB}}_b(t|\mathbf{X}) and SbIB(t|𝐗)S^{\text{IB}}_b(t|\mathbf{X}) be the IB CHF and survival estimator for the bbth survival tree. The IB ensemble estimators are HIB(t|𝐗)=1𝚗𝚝𝚛𝚎𝚎b=1𝚗𝚝𝚛𝚎𝚎Hb(t|𝐗),SIB(t|𝐗)=1𝚗𝚝𝚛𝚎𝚎b=1𝚗𝚝𝚛𝚎𝚎Sb(t|𝐗). \bar{H}^{\text{IB}}(t|\mathbf{X})=\frac{1}{\texttt{ntree}}\sum_{b=1}^{\texttt{ntree}} H_b(t|\mathbf{X}),\quad \bar{S}^{\text{IB}}(t|\mathbf{X})=\frac{1}{\texttt{ntree}}\sum_{b=1}^{\texttt{ntree}} S_b(t|\mathbf{X}). Let OiO_i record trees where case ii is OOB. The OOB ensemble estimators for individual ii are HiOOB(t)=1|Oi|bOiHbIB(t|𝐗i),SiOOB(t)=1|Oi|bOiSbIB(t|𝐗i),i=1,,n. \bar{H}_i^{\text{OOB}}(t)=\frac{1}{|O_i|}\sum_{b\in O_i} H_b^{\text{IB}}(t|\mathbf{X}_i),\quad \bar{S}_i^{\text{OOB}}(t)=\frac{1}{|O_i|}\sum_{b\in O_i} S_b^{\text{IB}}(t|\mathbf{X}_i), \quad i=1,\ldots,n. An important distinction between the two sets of estimators is that OOB estimators are used for inference on the training data and for estimating prediction error. In-bag estimators on the other hand are used for prediction and can be used for any feature 𝐗\mathbf{X}.

Why are two estimates provided?

Why does RSF estimate both the CHF and the survival function? Classically, we know the two are related by H(t)=log(S(t)). H(t) = -\log(S(t)). The problem is that even if it were true that Hb=log(Sb)H_b=-\log(S_b) for every tree b=1,,𝚗𝚝𝚛𝚎𝚎b=1,\ldots,\texttt{ntree}, the above identity will not hold for the ensemble. Let S\bar{S} and H\bar{H} denote the ensemble survival and CHF and let 𝔼b\mathbb{E}_b denote ensemble expectation (i.e., S=𝔼b[Sb]\bar{S}=\mathbb{E}_b[S_b] and H=𝔼b[Hb]\bar{H}=\mathbb{E}_b[H_b]). Then by Jensen’s inequality for convex functions, log(S)=log(𝔼b[Sb])𝔼b[log(Sb)]=𝔼b[Hb]=H. -\log(\bar{S}) = -\log(\mathbb{E}_b[S_b]) \le \mathbb{E}_b[-\log(S_b)] = \mathbb{E}_b[H_b] = \bar{H}. In other words, log-\log of the survival ensemble does not necessarily equal the ensemble of log-\log of the tree survival function. The inequality above also shows that taking log-\log of the ensemble survival will generally be smaller than the true ensemble CHF. This is why RSF provides the two values separately.

Prediction error

Prediction error for survival models is measured by 1C1- \text{C}, where C is Harrell’s concordance index [12]. Prediction error is between 0 and 1, and measures how well the predictor correctly ranks two random individuals in terms of survival. Unlike other measures of survival performance, Harrell’s C-index does not depend on choosing a fixed time for evaluation of the model and specifically takes into account censoring of individuals. The method is a popular means for assessing prediction performance in survival settings since it is easy to understand and interpret.

Mortality (the predicted value used by RSF)

To compute the concordance index we must define what constitutes a worse predicted outcome. For survival models this is defined by the concept of mortality which is the predicted value used by RSF. Let t1<<tmt_1<\cdots<t_m denote the entire set of unique event times for the learning data. The IB ensemble mortality for a feature 𝐗\mathbf{X} is defined as MIB(𝐗)=j=1mHIB(tj|𝐗). \bar{M}^{\text{IB}}(\mathbf{X}) = \sum_{j=1}^{m}\bar{H}^{\text{IB}}(t_j|\mathbf{X}). This estimates the number of deaths expected if all cases were similar to 𝐗\mathbf{X}. OOB ensemble mortality which is used for the C-index calculation is defined by MiOOB=j=1mHiOOB(tj),i=1,,n. \bar{M}_i^{\text{OOB}} = \sum_{j=1}^{m}\bar{H}_i^{\text{OOB}}(t_j), \quad i=1,\ldots,n. Individual ii is said to have a worse outcome than individual jj if MiOOB>MjOOB. \bar{M}_i^{\text{OOB}} > \bar{M}_{j}^{\text{OOB}}.

The mortality value [1] represents estimated risk for each individual calibrated to the scale of the number of events. Thus as a specific example, if ii has a mortality value of 100, then if all individuals had the same covariate as ii, which is 𝐗=𝐱i\mathbf{X}=\mathbf{x}_i, we would expect an average of 100 events.

C-index calculation

The C-index is calculated using the following steps:

  1. Form all possible pairs of observations over all the data.

  2. Omit those pairs where the shorter event time is censored. Also, omit pairs (i,j)(i,j) if Ti=TjT_i=T_j unless (δi=1,δj=0)(\delta_i=1, \delta_j=0) or (δi=0,δj=1)(\delta_i=0, \delta_j=1) or (δi=1,δj=1)(\delta_i=1, \delta_j=1). The last restriction only allows ties in event times if at least one of the observations is a death. Let the resulting pairs be denoted by 𝕊\mathbb{S}. Let permissible =|𝕊|=|\mathbb{S}|.

  3. If TiTjT_i \ne T_j, count 1 for each s𝕊s \in \mathbb{S} in which the shorter time had the worse predicted outcome.

  4. If TiTjT_i \ne T_j, count 0.5 for each s𝕊s \in \mathbb{S} in which MiOOB=MjOOB\bar{M}_i^{\text{OOB}} = \bar{M}_{j}^{\text{OOB}}.

  5. If Ti=TjT_i = T_j, count 1 for each s𝕊s \in \mathbb{S} in which MiOOB=MjOOB\bar{M}_i^{\text{OOB}} = \bar{M}_{j}^{\text{OOB}}.

  6. If Ti=TjT_i = T_j, count 0.5 for each s𝕊s \in \mathbb{S} in which MiOOBMjOOB\bar{M}_i^{\text{OOB}} \ne \bar{M}_{j}^{\text{OOB}}.

  7. Let concordance denote the resulting count over all permissible pairs. Define the concordance index C as

C=concordancepermissible.\begin{equation} C=\dfrac{concordance}{permissible}. \label{eqn:concordance} \end{equation}

  1. The error rate is $\tt PE=1−C$. Note that $0≤\tt PE≤1$ and that $\tt PE=0.5$ corresponds to a procedure doing no better than random guessing, whereas $\tt PE=0$ indicates perfect prediction.

Brier score

The Brier score, BS(t)\text{BS}(t), is another popular measure used to assess prediction peformance. Let Ŝ(t|𝐗)\hat{S}(t|\mathbf{X}) be some estimator of the survival function. To estimate the prediction performance of Ŝ\hat{S}, let Ĝ(t|𝐗)\hat{G}(t|\mathbf{X}) be a prechosen estimator of the censoring survival function, G(t|𝐗)={Cot|𝐗}G(t|\mathbf{X})=\mathbb{P}\{C^o\ge t|\mathbf{X}\}. The Brier score for Ŝ(t|𝐗)\hat{S}(t|\mathbf{X}) can be estimated by using inverse probability of censoring weights (IPCW) using the method of [13] BŜ(t)=1ni=1n{Ŝ2(t|𝐗i)I{Tit}δiĜ(Ti|𝐗i)+(1Ŝ(t|𝐗i))2I{Ti>t}Ĝ(t|𝐗i)}. \hat{\text{BS}}(t)= \frac{1}{n}\sum_{i=1}^{n}\left\{ \frac{\hat{S}^2(t|\mathbf{X}_i)I\{T_i\leq t\}{\delta}_i}{\hat{G}(T_i-|\mathbf{X}_i)} + \frac{\left(1-\hat{S}(t|\mathbf{X}_i)\right)^2 I\{T_i> t\}}{\hat{G}(t|\mathbf{X}_i)} \right\}. The integrated Brier score at time τ\tau is defined as IBS(τ)=1τ0τBS(t)dt \text{IBS}(\tau)=\frac{1}{\tau}\int_0^\tau \text{BS}(t)dt which can be estimated by substituting BŜ(t)\hat{\text{BS}}(t) for BS(t)\text{BS}(t). Lower values for the Brier score indicate better prediction performance. Using the Brier score we can calculate the continuous rank probability score (CRPS), defined as the integrated Brier score divided by time.

Illustration

To illustrate, we use the survival data from [14] consisting of 2231 adult patients with systolic heart failure. All patients underwent cardiopulmonary stress testing. During a mean follow-up of 5 years (maximum for survivors, 11 years), 742 patients died. The outcome is all-cause mortality and a total of p=39p=39 covariates were measured for each patient including demographic, cardiac and noncardiac comorbidity, and stress testing information.

library(randomForestSRC)
data(peakVO2, package = "randomForestSRC")
dta <- peakVO2
obj <- rfsrc(Surv(ttodead,died)~., dta,
             ntree = 1000, nodesize = 5, nsplit = 50, importance = TRUE)
print(obj)

>  1                          Sample size: 2231
>  2                     Number of deaths: 726
>  3                      Number of trees: 1000
>  4            Forest terminal node size: 5
>  5        Average no. of terminal nodes: 259.368
>  6 No. of variables tried at each split: 7
>  7               Total no. of variables: 39
>  8        Resampling used to grow trees: swor
>  9     Resample size used to grow trees: 1410
> 10                             Analysis: RSF
> 11                               Family: surv
> 12                       Splitting rule: logrank *random*
> 13        Number of random split points: 50
> 14                           (OOB) CRPS: 1.52830075
> 15              (OOB) standardized CRPS: 0.15497275
> 16    (OOB) Requested performance error: 0.29830498

From the above code, obj$survival obj$survival.oob and obj$chf obj$chf.oob contain the IB and OOB survival functions and CHFs respectively. All these four objects have a dimension of n×cn\times c, where nn is number of patients and cc is number of unique time points from the inputted argument ntime, and all the unique death times are ordered and stored in obj$time.interest. The one-dimensional predicted values are stored in obj$predicted and obj$predicted.oob, which are IB and OOB mortality. The continuous rank probability score (CRPS) is shown in line 14 which is defined as the integrated Brier score divided by time. The Brier score is calculated using inverse probability of censoring weights (IPCW) using the method of Gerds et al. (2006) [13]. The error rate is stored in obj$err.rate which is calculated using the C-calculation as described above and is shown in line 15. For the printout, line 2 displays the number of deaths; line 3 and 4 displays the number of trees and the size of terminal node, which are specified by the input argument ntree and nodesize. Line 5 displays the number of terminal nodes per tree averaged across the forest and line 6 reflects the input argument mtry, which is the number of variables randomly selected as candidates for splitting a node; line 8 displays the type of bootstrap, where swor refers to sampling without replacement and swr refers to sampling with replacement; line 9 displays the sample size for line 8 where for swor, the number equals to about 63.2% observations, which matches the ratio of the original data in sampling with replacement; line 10 and 11 show the type of forest where RSF and surv refer to family survival; line 12 displays splitting rule which matches the inputted argument splitrule and line 13 shows the number of random splits to consider for each candidate splitting variable which matches the inputted argument nsplit.

The following illustrates how to get the C-index directly from a RSF analysis using the built in function get.cindex.

get.cindex(obj$yvar[,1], obj$yvar[,2], obj$predicted.oob)
>  [1] 0.299378

Plot the estimated survival functions

Figure 1 displays the (in-bag) predicted survival functions for two hypothetical individuals, where all pp features of the two individuals are set to the median level except for the variable peak VO2_2. These two individuals were defined in the following code in newdata, whose predictions were stored in y.pred which was plotted in the next block of code. For one of the individuals this is set at the 25th quantile for peak VO2_2 (peak VO2=12.8_2=12.8 mL/kg per min) and shown using a solid black line. For the other individual this was set to the 75th quantile (peak VO2=19.3_2=19.3 mL/kg per min) and shown using a dashed red line.

newdata <- data.frame(lapply(1:ncol(obj$xvar),function(i){median(obj$xvar[,i])}))
colnames(newdata) <- obj$xvar.names
newdata1 <- newdata2 <- newdata
newdata1[,which(obj$xvar.names == "peak_vo2")] <- quantile(obj$xvar$peak_vo2, 0.25)
newdata2[,which(obj$xvar.names == "peak_vo2")] <- quantile(obj$xvar$peak_vo2, 0.75)
newdata <- rbind(newdata1,newdata2)
y.pred <- predict(obj,newdata = rbind(newdata,obj$xvar)[1:2,])
pdf("survival.pdf", width = 10, height = 8)
par(cex.axis = 2.0, cex.lab = 2.0, cex.main = 2.0, mar = c(6.0,6,1,1), mgp = c(4, 1, 0))
plot(round(y.pred$time.interest,2),y.pred$survival[1,], type="l", xlab="Time (Year)",   
     ylab="Survival", col=1, lty=1, lwd=2)
lines(round(y.pred$time.interest,2), y.pred$survival[2,], col=2, lty=2, lwd=2)
legend("topright", legend=c("Peak VO2=12.8","Peak VO2=19.3"), col=c(1:2), lty=1:2, cex=2, lwd=2)
dev.off() 

Figure 1

Predicted survival functions for two hypothetical individuals from RSF analysis of systolic heart failure data. Solid black line represents individual with peak VO2=12.8_2=12.8 mL/kg per min. Red dash line represents individual with peak VO2=19.3_2=19.3 mL/kg per min. All other variables for both individuals are set to the median value.

Plot the Brier Score and Calculate the CRPS

Figure 2 displays the Brier BŜ(t)\hat{\text{BS}}(t) for the RSF analysis of the heart failure data given above. These values are obtained by using the function get.brier.survival. In this function, two methods are available for estimating the censoring distribution used for the IPCW. The first method uses the Kaplan-Meier censoring distribution estimator and the second using a RSF censoring distribution estimator. The results are given below:

## obtain Brier score using KM and RSF censoring distribution estimators
bs.km <- get.brier.survival(obj, cens.mode = "km")$brier.score
bs.rsf <- get.brier.survival(obj, cens.mode = "rfsrc")$brier.score

## plot the brier score
plot(bs.km, type = "s", col = 2)
lines(bs.rsf, type ="s", col = 4)
legend("bottomright", legend = c("cens.model = km", "cens.model = rfsrc"), fill = c(2,4))

Figure 2

The Brier score can be used to obtain other information, such as the CRPS. The get.brier.survival function actually returns this value, but below we show how to extract CRPS for ever time point directly from the previously estimated Brier score.

## here's how to calculate the CRPS for every time point
trapz <- randomForestSRC:::trapz
time <- obj$time.interest
crps.km <- sapply(1:length(time), function(j) {
  trapz(time[1:j], bs.km[1:j, 2] / diff(range(time[1:j])))
})
crps.rsf <- sapply(1:length(time), function(j) {
  trapz(time[1:j], bs.rsf[1:j, 2] / diff(range(time[1:j])))
})
 
## plot CRPS as function of time
plot(time, crps.km, ylab = "CRPS", type = "s", col = 2)
lines(time, crps.rsf, type ="s", col = 4)
legend("bottomright", legend=c("cens.model = km", "cens.model = rfsrc"), fill=c(2,4))

Variable Importance

RSF provides a fully nonparametric measure of variable importance (VIMP). The most common measure is Breiman-Cutler VIMP [15] and is called permutation importance. VIMP calculated using permutation importance adopts a prediction based approach by measuring prediction error attributable to the variable. A clever feature is that rather than using cross-validation, which can be computationally expensive, permutation importance makes use of OOB estimation. Specifically, to calculate the VIMP for a variable XX, we randomly permute the OOB values of XX in a tree (the remaining coordinates of 𝐗\mathbf{X} are not altered). The perturbed OOB data is dropped down the tree and the OOB error for the resulting tree predictor determined. The amount by which this new error exceeds the original OOB error for the tree equals the tree importance for XX. Averaging over trees yields permutation importance for XX.

Large positive VIMP indicates high predictive ability while zero or negative values identify noise variables. Subsampling [16] can be used to estimate the standard error and to approximate the confidence intervals for VIMP. Figure 3 displays delete-dd jackknife 99% asymptotic normal confidence intervals for the p=39p=39 variables from the systolic heart failure RSF analysis. Prediction error was calculated using the C-index.

  jk.obj <- subsample(obj)
  pdf("VIMPsur.pdf", width = 15, height = 20)
  par(oma = c(0.5, 10, 0.5, 0.5))
  par(cex.axis = 2.0, cex.lab = 2.0, cex.main = 2.0, mar = c(6.0,17,1,1), mgp = c(4, 1, 0))
  plot(jk.obj, xlab = "Variable Importance (x 100)", cex = 1.2)
  dev.off()

Figure 3

Delete-d jackknife 99% asymptotic normal confidence intervals of VIMP from RSF analysis of systolic heart failure data. Prediction error is defined using Harrell’s concordance index.



Cite this vignette as
H. Ishwaran, M. S. Lauer, E. H. Blackstone, M. Lu, and U. B. Kogalur. 2021. “randomForestSRC: random survival forests vignette.” http://randomforestsrc.org/articles/survival.html.

@misc{HemantRandomS,
    author = "Hemant Ishwaran and Michael S. Lauer and Eugene H. Blackstone and Min Lu 
               and Udaya B. Kogalur",
    title = {{randomForestSRC}: random survival forests vignette},
    year = {2021},
    url = {http://randomforestsrc.org/articles/survival.html},
    howpublished = "\url{http://randomforestsrc.org/articles/survival.html}",
    note = "[accessed date]"
}

References

1. Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. The Annals of Applied Statistics. 2008;2:841–60.
2. Ciampi A, Hogg SA, McKinney S, Thiffault J. RECPAM: a computer program for recursive partition and amalgamation for censored survival data. Comp Methods Programs Biomed. 1988;26:239–56.
3. Segal MR. Regression trees for censored data. Biometrics. 1988;35–47.
4. Segal MR. Extending the elements of tree-structured regression. Statistical Methods in Medical Research. 1995;4:219–36.
5. LeBlanc M, Crowley J. Relative risk trees for censored survival data. Biometrics. 1992;411–25.
6. LeBlanc M, Crowley J. Survival trees by goodness of split. Journal of the American Statistical Association. 1993;88:457–67.
7. Hothorn T, Lausen B. On the exact distribution of maximally selected rank statistics. Computational Statistics & Data Analysis. 2003;43:121–37.
8. Ishwaran H, Kogalur UB, Gorodeski EZ, Minn AJ, Lauer MS. High-dimensional variable selection for survival data. Journal of the American Statistical Association. 2010;105:205–17.
9. Ishwaran H. The effect of splitting on random forests. Machine Learning. 2015;99:75–118.
10. Loh W-Y, Shih Y-S. Split selection methods for classification trees. Statistica Sinica. 1997;815–40.
11. Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Machine Learning. 2006;63:3–42.
12. Harrell Jr FE, Califf RM, Pryor DB, Lee KL, Rosati RA, et al. Evaluating the yield of medical tests. JAMA. 1982;247:2543–6.
13. Gerds TA, Schumacher M. Consistent estimation of the expected brier score in general survival models with right-censored event times. Biometrical Journal. 2006;48:1029–40.
14. Hsich E, Gorodeski EZ, Blackstone EH, Ishwaran H, Lauer MS. Identifying important risk factors for survival in patient with systolic heart failure using random survival forests. Circulation: Cardiovascular Quality and Outcomes. 2011;4:39–45.
15. Breiman L. Manual on setting up, using, and understanding random forests v3. 1. Statistics Department University of California Berkeley, CA, USA. 2002;1.
16. Ishwaran H, Lu M. Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival. Statistics in medicine. 2019;38:558–82. https://ishwaran.org/papers/IL.StatMed.2019.pdf.