[R] Random Forest % Variation vs Psuedo-R^2?
Ryan Harrigan
iluvsa at ucla.edu
Mon Jun 8 03:38:21 CEST 2009
Hi all (and Andy!),
When running a randomForest run in R, I get the last part of an output
(with do.trace=T) that looks like this:
1993 | 0.04606 130.43 |
1994 | 0.04605 130.40 |
1995 | 0.04605 130.43 |
1996 | 0.04605 130.43 |
1997 | 0.04606 130.44 |
1998 | 0.04607 130.47 |
1999 | 0.04606 130.46 |
2000 | 0.04605 130.42 |
With the first column representing the iteration, the second column
representing the OOB MSE, and the last column representing the %Var(y). If I
calculate a "Psuedo-R^2" from these numbers, I would get;
1-(.04605/1.3042) = 0.965
Here's the question, if I look at the summary of forest.rf (this same run),
I get the following;
randomForest(formula = Prev ~ ., data = plas, ntree = 2000, importance =
TRUE, do.trace = T)
Type of random forest: regression
Number of trees: 2000
No. of variables tried at each split: 5
Mean of squared residuals: 0.04605177
% Var explained: -30.42
What does that -30.42 % Var explained relate to? I find it interesting that
the %Var(y) is 130.42, and that the %Var explained is a very similar number,
but have no idea how they are related. From my calculations, it seems like I
have a good predictor set (Psuedo R^2 over 95%), but am I missing something?
Cheers,
Ryan
--
Ryan Harrigan, Ph.D.
Center for Tropical Research
Institute of the Environment
University of California, Los Angeles
La Kretz Hall, Suite 300
Box 951496
Los Angeles, CA 90095-1496
203-804-9505
iluvsa at ucla.edu
More information about the R-help
mailing list