`tune.rfsrc.Rd`

Finds the optimal mtry and nodesize tuning parameter for a random forest using out-of-sample error. Applies to all families.

```
tune(formula, data,
mtryStart = ncol(data) / 2,
nodesizeTry = c(1:9, seq(10, 100, by = 5)), ntreeTry = 100,
sampsize = function(x){min(x * .632, max(150, x ^ (3/4)))},
nsplit = 1, stepFactor = 1.25, improve = 1e-3, strikeout = 3, maxIter = 25,
trace = FALSE, doBest = FALSE, ...)
tune.nodesize(formula, data,
nodesizeTry = c(1:9, seq(10, 150, by = 5)), ntreeTry = 100,
sampsize = function(x){min(x * .632, max(150, x ^ (4/5)))},
nsplit = 1, trace = TRUE, ...)
```

- formula
A symbolic description of the model to be fit.

- data
Data frame containing the y-outcome and x-variables.

- mtryStart
Starting value of mtry.

- nodesizeTry
Values of nodesize optimized over.

- ntreeTry
Number of trees used for the tuning step.

- sampsize
Function specifying requested size of subsampled data. Can also be passed in as a number.

- nsplit
Number of random splits used for splitting.

- stepFactor
At each iteration, mtry is inflated (or deflated) by this value.

- improve
The (relative) improvement in out-of-sample error must be by this much for the search to continue.

- strikeout
The search is discontinued when the relative improvement in OOB error is negative. However

`strikeout`

allows for some tolerance in this. If a negative improvement is noted a total of`strikeout`

times, the search is stopped. Increase this value only if you want an exhaustive search.- maxIter
The maximum number of iterations allowed for each mtry bisection search.

- trace
Print the progress of the search?

- doBest
Return a forest fit with the optimal mtry and nodesize parameters?

- ...
Further options to be passed to

`rfsrc.fast`

.

`tune`

returns a matrix whose first and second
columns contain the nodesize and mtry values searched and whose third
column is the corresponding out-of-sample error. Uses standardized error
and in the case of multivariate forests it is the averaged
standardized rror over the outcomes and for competing risks it is
the averaged standardized error over the event types.

If `doBest=TRUE`

, also returns a forest object fit using the
optimal `mtry`

and `nodesize`

values.

All calculations (including the final optimized forest) are based on
the fast forest interface `rfsrc.fast`

which utilizes
subsampling. However, while this yields a fast optimization strategy,
such a solution can only be considered approximate. Users may wish to
tweak various options to improve accuracy. Increasing the default
`sampsize`

will definitely help. Increasing `ntreeTry`

(which is set to 100 for speed) may also help. It is also useful to
look at contour plots of the out-of-sample error as a function of
`mtry`

and `nodesize`

(see example below) to identify
regions of the parameter space where error rate is small.

`tune.nodesize`

returns the optimal nodesize where optimization is
over `nodesize`

only.

```
# \donttest{
## ------------------------------------------------------------
## White wine classification example
## ------------------------------------------------------------
## load the data
data(wine, package = "randomForestSRC")
wine$quality <- factor(wine$quality)
## set the sample size manually
o <- tune(quality ~ ., wine, sampsize = 100)
## here is the optimized forest
print(o$rf)
## visualize the nodesize/mtry OOB surface
if (library("interp", logical.return = TRUE)) {
## nice little wrapper for plotting results
plot.tune <- function(o, linear = TRUE) {
x <- o$results[,1]
y <- o$results[,2]
z <- o$results[,3]
so <- interp(x=x, y=y, z=z, linear = linear)
idx <- which.min(z)
x0 <- x[idx]
y0 <- y[idx]
filled.contour(x = so$x,
y = so$y,
z = so$z,
xlim = range(so$x, finite = TRUE) + c(-2, 2),
ylim = range(so$y, finite = TRUE) + c(-2, 2),
color.palette =
colorRampPalette(c("yellow", "red")),
xlab = "nodesize",
ylab = "mtry",
main = "error rate for nodesize and mtry",
key.title = title(main = "OOB error", cex.main = 1),
plot.axes = {axis(1);axis(2);points(x0,y0,pch="x",cex=1,font=2);
points(x,y,pch=16,cex=.25)})
}
## plot the surface
plot.tune(o)
}
## ------------------------------------------------------------
## tuning for class imbalanced data problem
## - see imbalanced function for details
## - use rfq and perf.type = "gmean"
## ------------------------------------------------------------
data(breast, package = "randomForestSRC")
breast <- na.omit(breast)
o <- tune(status ~ ., data = breast, rfq = TRUE, perf.type = "gmean")
print(o)
## ------------------------------------------------------------
## tune nodesize for competing risk - wihs data
## ------------------------------------------------------------
data(wihs, package = "randomForestSRC")
plot(tune.nodesize(Surv(time, status) ~ ., wihs, trace = TRUE)$err)
# }
```