Plot Subsampled VIMP Confidence Intervals

Plots VIMP (variable importance) confidence regions obtained from subsampling a forest.

plot.subsample(x, alpha = .01, xvar.names,
 standardize = TRUE, normal = TRUE, jknife = FALSE, target, m.target = NULL,
 pmax = 75, main = "", sorted = TRUE, show.plots = TRUE, ...)

Arguments

x: An object obtained from calling subample.
alpha: Desired level of significance.
xvar.names: Names of the x-variables to be used. If not specified all variables used.
standardize: Standardize VIMP? For regression families, VIMP is standardized by dividing by the variance. For all other families, VIMP is unaltered.
normal: Use parametric normal confidence regions or nonparametric regions? Generally, parametric regions perform better.
jknife: Use the delete-d jackknife variance estimator?
target: For classification families, an integer or character value specifying the class VIMP will be conditioned on (default is to use unconditional VIMP). For competing risk families, an integer value between 1 and J indicating the event VIMP is requested, where J is the number of event types. The default is to use the first event.
m.target: Character value for multivariate families specifying the target outcome to be used. If left unspecified, the algorithm will choose a default target.
pmax: Trims the data to this number of variables (sorted by VIMP).
main: Title used for plot.
sorted: Should variables be sorted by importance values?
show.plots: Should plots be displayed? Allows users to produce their own custom plots.
...: Further arguments that can be passed to bxp.

Details

Most of the options used by the R function bxp will work here and can be used for customization of plots. Currently the following parameters will work:

"xaxt", "yaxt", "las", "cex.axis", "col.axis", "cex.main", "col.main", "sub", "cex.sub", "col.sub", "ylab", "cex.lab", "col.lab"

Value

Invisibly, returns the boxplot data that is plotted.

Author

Hemant Ishwaran and Udaya B. Kogalur

References

Ishwaran H. and Lu M. (2017). Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival.

Politis, D.N. and Romano, J.P. (1994). Large sample confidence regions based on subsamples under minimal assumptions. The Annals of Statistics, 22(4):2031-2050.

Shao, J. and Wu, C.J. (1989). A general theory for jackknife variance estimation. The Annals of Statistics, 17(3):1176-1197.

Examples

# \donttest{
o <- rfsrc(Ozone ~ ., airquality)
oo <- subsample(o)
plot.subsample(oo)
plot.subsample(oo, xvar.names = o$xvar.names[1:3])
plot.subsample(oo, jknife = FALSE)
plot.subsample(oo, alpha = .01)
plot(oo,cex.axis=.5)
# }