# [R] Problems with randomForest for regression

jmoreira@fe.up.pt jmoreira at fe.up.pt
Wed Oct 13 19:21:22 CEST 2004

```Dear list,

I am trying to do a benchmark study for my case study. It is a regression
problem. Among other models I use randomForest.

Using the following code the result is around 0.628, and this make sense
comparing with other methods. The Theil function implements Theil's U
statistic. I do not present the definition of some variables because it is not
important to understand my problem. I use sliding window trategy.

library("randomForest")

rf.theil <- vector()
learner='randomForest'

for (i in 1:6)
{
eval.sum <- 0
test.pos=test.pos.ini

while (test.pos <= n)
{
naive.pred <- c(orig.data[test.pos-1,7])
model <- randomForest(Duracao ~ ., data=orig.data[1:(test.pos-1),],
na.action=na.omit, ntree=5000, mtry=i)
preds <- predict(model,orig.data[test.pos:min(n,test.pos+relearn.step-
1),])
test.pos <- test.pos+relearn.step

a<-theil(preds, naive.pred, orig.data[test.pos:min
(n,test.pos+relearn.step-1),7])
if (is.na(a)==FALSE) {eval.sum <- eval.sum + a}
}
rf.theil <- c(rf.theil, eval.sum/(trunc((n-test.pos.ini)/relearn.step)+1))
}

rf.min <- min(rf.theil, na.rm=TRUE)
rf.indices <- seq(along=rf.theil)[rf.theil == rf.min]

But running 5 times randomForest for each value of i, and choosing the best
result according U statistic, I got a value around 0.178... And this value
does not make sense. I use the some strategie with nnet and it gives good
results. The code is:

library("randomForest")

rf.theil <- vector()

for (i in 1:6)
{
eval <- 100000
eval.sum <- 0
test.pos=test.pos.ini

while (test.pos <= n)
{
naive.pred <- c(orig.data[test.pos-1,7])
for (j in 1:5)
{
model <- randomForest(Duracao ~ ., data=orig.data[1:(test.pos-1),],
na.action=na.omit, ntree=5000, mtry=i)
preds <- predict(model,
orig.data[test.pos:min(n,test.pos+relearn.step-1),])
eval.temp <- theil(preds, naive.pred,
orig.data[test.pos:min(n,test.pos+relearn.step-1),7])
if (eval.temp < eval)
eval <- eval.temp
}
if (is.na(eval)==FALSE)
eval.sum <- eval.sum + eval
test.pos <- test.pos+relearn.step
}
rf.theil <- c(rf.theil, eval.sum/(trunc((n-test.pos.ini)/relearn.step)+1))
}

rf.min <- min(rf.theil, na.rm=TRUE)

Thanks for any help

Joao Moreira

```

More information about the R-help mailing list