Acquire Maximal Subtree Information

Extract maximal subtree information from a RF-SRC object. Used for variable selection and identifying interactions between variables.

# S3 method for class 'rfsrc'
max.subtree(object,
  max.order = 2, sub.order = FALSE, conservative = FALSE, ...)

Arguments

object: An object of class (rfsrc, grow) or (rfsrc, forest).
max.order: Non-negative integer specifying the maximum interaction order for which minimal depth is calculated. Defaults to 2. Set max.order=0 to return first-order depths only. When max.order=0, conservative is automatically set to FALSE.
sub.order: Logical. If TRUE, returns the minimal depth of each variable conditional on every other variable. Useful for investigating variable interdependence. See Details.
conservative: Logical. If TRUE, uses a conservative threshold for selecting variables based on the marginal minimal depth distribution (Ishwaran et al., 2010). If FALSE, uses the tree-averaged distribution, which is less conservative and typically identifies more variables in high-dimensional settings.
...: Additional arguments passed to or from other methods.

Details

The maximal subtree for a variable x is the largest subtree in which the root node splits on x. The largest possible maximal subtree is the full tree (root node), though multiple maximal subtrees may exist for a variable. A variable may also have no maximal subtree if it is never used for splitting. See Ishwaran et al. (2010, 2011) for further discussion.

The minimal depth of a maximal subtree-called the first-order depth-quantifies the predictive strength of a variable. It is defined as the distance from the root node to the parent of the closest maximal subtree for x. Smaller values indicate stronger predictive impact. A variable is flagged as strong if its minimal depth is below the mean of the minimal depth distribution.

The second-order depth is the distance from the root to the second-closest maximal subtree of x. To request depths beyond first order, use the max.order option (e.g., max.order = 2 returns both first and second-order depths). Set max.order = 0 to retrieve first-order depths for each variable in each tree.

Set sub.order = TRUE to obtain the relative minimal depth of each variable j within the maximal subtree of another variable i. This returns a p x p matrix (with p the number of variables) whose entry (i,j) is the normalized relative depth of j in i's subtree. Entry (i,i) gives the depth of i relative to the root. Read the matrix across rows to assess inter-variable relationships: small (i,j) entries suggest interactions between variables i and j.

For competing risks, all analyses are unconditional (non-event specific).

Value

Invisibly returns a list with the following components:

order: Matrix of order depths for each variable up to max.order, averaged over trees. The matrix has p rows and max.order columns, where p is the number of variables. If max.order = 0, returns a matrix of dimension p x ntree containing first-order depths for each variable by tree.
count: Average number of maximal subtrees per variable, normalized by tree size.
nodes.at.depth: List of vectors recording the number of non-terminal nodes at each depth level for each tree.
sub.order: Matrix of average minimal depths of each variable relative to others (i.e., conditional minimal depth matrix). NULL if sub.order = FALSE.
threshold: Threshold value for selecting strong variables based on the mean of the minimal depth distribution.
threshold.1se: Conservative threshold equal to the mean minimal depth plus one standard error.
topvars: Character vector of selected variable names using the threshold criterion.
topvars.1se: Character vector of selected variable names using the threshold.1se criterion.
percentile: Percentile value of minimal depth for each variable.
density: Estimated density of the minimal depth distribution.
second.order.threshold: Threshold used for selecting strong second-order depth variables.

Author

Hemant Ishwaran and Udaya B. Kogalur

References

Ishwaran H., Kogalur U.B., Gorodeski E.Z, Minn A.J. and Lauer M.S. (2010). High-dimensional variable selection for survival data. J. Amer. Statist. Assoc., 105:205-217.

Ishwaran H., Kogalur U.B., Chen X. and Minn A.J. (2011). Random survival forests for high-dimensional data. Statist. Anal. Data Mining, 4:115-132.

Examples

# \donttest{
## ------------------------------------------------------------
## survival analysis
## first and second order depths for all variables
## ------------------------------------------------------------

data(veteran, package = "randomForestSRC")
v.obj <- rfsrc(Surv(time, status) ~ . , data = veteran)
v.max <- max.subtree(v.obj)

# first and second order depths
print(round(v.max$order, 3))

# the minimal depth is the first order depth
print(round(v.max$order[, 1], 3))

# strong variables have minimal depth less than or equal
# to the following threshold
print(v.max$threshold)

# this corresponds to the set of variables
print(v.max$topvars)

## ------------------------------------------------------------
## regression analysis
## try different levels of conservativeness
## ------------------------------------------------------------

mtcars.obj <- rfsrc(mpg ~ ., data = mtcars)
max.subtree(mtcars.obj)$topvars
max.subtree(mtcars.obj, conservative = TRUE)$topvars
# }