[R] Concern with randomForest
Ryan Harrigan
iluvsa at ucla.edu
Tue Apr 7 05:35:55 CEST 2009
Hi all,
When running a randomForest run using the following command:
forestplas=randomForest(Prev~.,data=plas,ntree=200000)
print(forestplas)
I get the following result:
Call:
randomForest(formula = Prev ~ ., data = plas, ntree = 2e+05,
importance = TRUE)
Type of random forest: regression
Number of trees: 2e+05
No. of variables tried at each split: 5
Mean of squared residuals: 0.0431127
% Var explained: -22.1
Here's my concern; what is the explanation here for a negative percent
variation explained? My understanding is that this value is calculated using
the formula;
1-MSE(OOB)/nodesize (from Liaw & Wiener's description)
Is this analagous to an r-squared that has not been run through a stepwise
procedure? Should I be removing variables not contributing to models before
running randomForest? This negative value seems contradictory to my standard
multiple regression results which indicate up to 58% of the variation
explained.
Thanks for you help on this, any comments are welcome!
--
Ryan Harrigan, Ph.D.
Center for Tropical Research
Institute of the Environment
University of California, Los Angeles
La Kretz Hall, Suite 300
Box 951496
Los Angeles, CA 90095-1496
203-804-9505
iluvsa at ucla.edu
More information about the R-help
mailing list