mtry
and nodesize
tune.rfsrc.Rd
Finds the optimal mtry
and nodesize
for a random forest
using out-of-bag (OOB) error. Two search strategies are supported: a
grid-based search and a golden-section search with noise control. Works
for all response families supported by rfsrc.fast
.
tune(formula, data,
mtry.start = ncol(data) / 2,
nodesize.try = c(1:9, seq(10, 100, by = 5)), ntree.try = 100,
sampsize = function(x) { min(x * .632, max(150, x^(3/4))) },
nsplit = 1, step.factor = 1.25, improve = 1e-3, strikeout = 3, max.iter = 25,
method = c("grid", "golden"),
final.window = 5, reps.initial = 2, reps.final = 3,
trace = FALSE, do.best = TRUE, seed = NULL, ...)
tune.nodesize(formula, data,
nodesize.try = c(1:9, seq(10, 150, by = 5)), ntree.try = 100,
sampsize = function(x) { min(x * .632, max(150, x^(4/5))) },
nsplit = 1, method = c("grid", "golden"),
final.window = 5, reps.initial = 2, reps.final = 3, max.iter = 50,
trace = TRUE, seed = NULL, ...)
A model formula.
A data frame with response and predictors.
Initial mtry
for tune
.
Candidate nodesize
values. Only values \(\le\) floor(sampsize(n)/2)
are used.
Number of trees grown at each tuning evaluation.
Function or numeric giving the per-tree subsample size. During tuning a single numeric size ssize
is computed and passed to rfsrc.fast
. If a vector is supplied (e.g., class specific), its total is used for ssize
.
Number of random split points to consider at each node.
Multiplicative step-out factor over mtry
for grid search in tune
.
Minimum relative improvement required to continue a search step in tune
.
Maximum number of consecutive non-improving steps allowed in tune
.
Maximum number of iterations for the step-out search in tune
or the coordinate loop when method = "golden"
.
Search strategy: "grid"
(default) or "golden"
.
For golden search, the terminal bracket width for the one-dimensional line search.
Replicates averaged at interior evaluations during golden iterations.
Replicates averaged for each candidate during the final local sweep in golden search.
If TRUE
, prints progress.
If TRUE
, tune
fits and returns a forest at the optimal pair.
Optional integer for reproducible tuning. The holdout split (when used) and all tuning fits become deterministic for a given seed.
Additional arguments passed to rfsrc.fast
. Arguments that control tuning itself (perf.type
, forest
, save.memory
, ntree
, mtry
, nodesize
, sampsize
, nsplit
) are managed internally.
Error estimate. If 2 * ssize < n
, a disjoint holdout of
size ssize
is used for evaluation; otherwise OOB error is
used.
Subsample used during tuning. Both functions derive a single
integer ssize
from sampsize
and pass it to
rfsrc.fast
for all tuning fits. This improves stability
and comparability across candidates. When do.best = TRUE
in
tune
, the final forest is fit with the user-supplied
sampsize
exactly as provided.
Grid search. tune
performs a step-out search over
mtry
for each nodesize
in nodesize.try
, using
step.factor
, improve
, strikeout
, and
max.iter
. tune.nodesize
evaluates the supplied
nodesize.try
grid directly.
Golden search. Uses a guarded golden-section line search with
noise control. For each one-dimensional search (over nodesize
or
mtry
), the routine probes a small left-anchor grid 1:9
,
iterates golden shrinkage until the bracket width is at most
final.window
, then runs a short local sweep with
reps.final
replicates. In tune
the searches over
nodesize
and mtry
alternate in a simple coordinate loop,
with improve
and strikeout
as stopping controls.
For tune
:
results
: matrix with columns nodesize
, mtry
, err
.
optimal
: named numeric vector c(nodesize = ..., mtry = ...)
.
rf
: fitted forest at the optimum if do.best = TRUE
.
For tune.nodesize
:
nsize.opt
: optimal nodesize
.
err
: data frame with columns nodesize
and err
.
# \donttest{
## ------------------------------------------------------------
## White wine classification example
## ------------------------------------------------------------
data(wine, package = "randomForestSRC")
wine$quality <- factor(wine$quality)
## Fixed seed makes tuning reproducible
set.seed(1)
## Full tuner over nodesize and mtry (grid)
o1 <- tune(quality ~ ., wine, sampsize = 100, method = "grid")
print(o1$optimal)
## Golden search alternative
o2 <- tune(quality ~ ., wine, sampsize = 100, method = "golden",
reps.initial = 2, reps.final = 3, seed = 1)
print(o2$optimal)
## visualize the nodesize/mtry surface
if (library("interp", logical.return = TRUE)) {
plot.tune <- function(o, linear = TRUE) {
x <- o$results[, 1]
y <- o$results[, 2]
z <- o$results[, 3]
so <- interp(x = x, y = y, z = z, linear = linear)
idx <- which.min(z)
x0 <- x[idx]; y0 <- y[idx]
filled.contour(x = so$x, y = so$y, z = so$z,
xlim = range(so$x, finite = TRUE) + c(-2, 2),
ylim = range(so$y, finite = TRUE) + c(-2, 2),
color.palette = colorRampPalette(c("yellow", "red")),
xlab = "nodesize", ylab = "mtry",
main = "error rate for nodesize and mtry",
key.title = title(main = "OOB error", cex.main = 1),
plot.axes = {
axis(1); axis(2)
points(x0, y0, pch = "x", cex = 1, font = 2)
points(x, y, pch = 16, cex = .25)
})
}
plot.tune(o1)
plot.tune(o2)
}
## ------------------------------------------------------------
## nodesize only: grid vs golden
## ------------------------------------------------------------
o3 <- tune.nodesize(quality ~ ., wine, sampsize = 100, method = "grid",
trace = TRUE, seed = 1)
o4 <- tune.nodesize(quality ~ ., wine, sampsize = 100, method = "golden",
reps.initial = 2, reps.final = 3, trace = TRUE, seed = 1)
plot(o3$err, type = "s", xlab = "nodesize", ylab = "error")
## ------------------------------------------------------------
## Tuning for class imbalance (rfq with geometric mean performance)
## ------------------------------------------------------------
data(breast, package = "randomForestSRC")
breast <- na.omit(breast)
o5 <- tune(status ~ ., data = breast, rfq = TRUE, perf.type = "gmean",
method = "golden", seed = 1)
print(o5$optimal)
## ------------------------------------------------------------
## Competing risks example (nodesize only)
## ------------------------------------------------------------
data(wihs, package = "randomForestSRC")
plot(tune.nodesize(Surv(time, status) ~ ., wihs, trace = TRUE)$err, type = "s")
# }