max.subtree.rfsrc.RdExtract maximal subtree information from a RF-SRC object. Used for variable selection and identifying interactions between variables.
# S3 method for class 'rfsrc'
max.subtree(object,
max.order = 2, sub.order = FALSE, conservative = FALSE, ...)An object of class (rfsrc, grow) or (rfsrc, forest).
Non-negative integer specifying the maximum interaction order for which minimal depth is calculated. Defaults to 2. Set max.order=0 to return first-order depths only. When max.order=0, conservative is automatically set to FALSE.
Logical. If TRUE, returns the minimal depth of each variable conditional on every other variable. Useful for investigating variable interdependence. See Details.
Logical. If TRUE, uses a conservative threshold for selecting variables based on the marginal minimal depth distribution (Ishwaran et al., 2010). If FALSE, uses the tree-averaged distribution, which is less conservative and typically identifies more variables in high-dimensional settings.
Additional arguments passed to or from other methods.
The maximal subtree for a variable x is the largest subtree in which the root node splits on x. The largest possible maximal subtree is the full tree (root node), though multiple maximal subtrees may exist for a variable. A variable may also have no maximal subtree if it is never used for splitting. See Ishwaran et al. (2010, 2011) for further discussion.
The minimal depth of a maximal subtree-called the first-order depth-quantifies the predictive strength of a variable. It is defined as the distance from the root node to the parent of the closest maximal subtree for x. Smaller values indicate stronger predictive impact. A variable is flagged as strong if its minimal depth is below the mean of the minimal depth distribution.
The second-order depth is the distance from the root to the second-closest maximal subtree of x. To request depths beyond first order, use the max.order option (e.g., max.order = 2 returns both first and second-order depths). Set max.order = 0 to retrieve first-order depths for each variable in each tree.
Set sub.order = TRUE to obtain the relative minimal depth of
each variable j within the maximal subtree of another variable
i. This returns a p x p matrix (with p the number
of variables) whose entry (i,j) is the normalized relative depth of
j in i's subtree. Entry (i,i) gives the depth of
i relative to the root. Read the matrix across rows to assess
inter-variable relationships: small (i,j) entries suggest interactions
between variables i and j.
For competing risks, all analyses are unconditional (non-event specific).
Invisibly returns a list with the following components:
Matrix of order depths for each variable up to
max.order, averaged over trees. The matrix has p rows
and max.order columns, where p is the number of
variables. If max.order = 0, returns a matrix of dimension
p x ntree containing first-order depths for each variable by
tree.
Average number of maximal subtrees per variable, normalized by tree size.
List of vectors recording the number of non-terminal nodes at each depth level for each tree.
Matrix of average minimal depths of each variable relative to others (i.e., conditional minimal depth matrix). NULL if sub.order = FALSE.
Threshold value for selecting strong variables based on the mean of the minimal depth distribution.
Conservative threshold equal to the mean minimal depth plus one standard error.
Character vector of selected variable names using the threshold criterion.
Character vector of selected variable names using the threshold.1se criterion.
Percentile value of minimal depth for each variable.
Estimated density of the minimal depth distribution.
Threshold used for selecting strong second-order depth variables.
Ishwaran H., Kogalur U.B., Gorodeski E.Z, Minn A.J. and Lauer M.S. (2010). High-dimensional variable selection for survival data. J. Amer. Statist. Assoc., 105:205-217.
Ishwaran H., Kogalur U.B., Chen X. and Minn A.J. (2011). Random survival forests for high-dimensional data. Statist. Anal. Data Mining, 4:115-132.
# \donttest{
## ------------------------------------------------------------
## survival analysis
## first and second order depths for all variables
## ------------------------------------------------------------
data(veteran, package = "randomForestSRC")
v.obj <- rfsrc(Surv(time, status) ~ . , data = veteran)
v.max <- max.subtree(v.obj)
# first and second order depths
print(round(v.max$order, 3))
# the minimal depth is the first order depth
print(round(v.max$order[, 1], 3))
# strong variables have minimal depth less than or equal
# to the following threshold
print(v.max$threshold)
# this corresponds to the set of variables
print(v.max$topvars)
## ------------------------------------------------------------
## regression analysis
## try different levels of conservativeness
## ------------------------------------------------------------
mtcars.obj <- rfsrc(mpg ~ ., data = mtcars)
max.subtree(mtcars.obj)$topvars
max.subtree(mtcars.obj, conservative = TRUE)$topvars
# }