[R] Random Forest % Variation vs Psuedo-R^2?

Liaw, Andy andy_liaw at merck.com
Mon Jun 8 15:45:58 CEST 2009


It actually means that the MSE (0.04605) is 130.42% of var(y), thus the
model had not provided any better explanatory power than predicting by
mean(y).  The pseudo R^2 is just 100% - 130.42% = -30.42%.  Remember
that this is not the resubstituttion estimate because it is computed
from the OOB estimate of MSE.  

HTH,
Andy 

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Ryan Harrigan
> Sent: Sunday, June 07, 2009 9:38 PM
> To: r-help at r-project.org
> Subject: [R] Random Forest % Variation vs Psuedo-R^2?
> 
> Hi all (and Andy!),
>     When running a randomForest run in R, I get the last part 
> of an output
> (with do.trace=T) that looks like this:
> 
> 1993 |  0.04606   130.43 |
> 1994 |  0.04605   130.40 |
> 1995 |  0.04605   130.43 |
> 1996 |  0.04605   130.43 |
> 1997 |  0.04606   130.44 |
> 1998 |  0.04607   130.47 |
> 1999 |  0.04606   130.46 |
> 2000 |  0.04605   130.42 |
> 
> With the first column representing the iteration, the second column
> representing the OOB MSE, and the last column representing 
> the %Var(y). If I
> calculate a "Psuedo-R^2" from these numbers, I would get;
> 
> 1-(.04605/1.3042) = 0.965
> 
> Here's the question, if I look at the summary of forest.rf 
> (this same run),
> I get the following;
> 
> randomForest(formula = Prev ~ ., data = plas, ntree = 2000, 
> importance =
> TRUE, do.trace = T)
>                Type of random forest: regression
>                      Number of trees: 2000
> No. of variables tried at each split: 5
> 
>           Mean of squared residuals: 0.04605177
>                     % Var explained: -30.42
> 
> What does that -30.42 % Var explained relate to? I find it 
> interesting that
> the %Var(y) is 130.42, and that the %Var explained is a very 
> similar number,
> but have no idea how they are related. From my calculations, 
> it seems like I
> have a good predictor set (Psuedo R^2 over 95%), but am I 
> missing something?
> 
> Cheers,
> 
> Ryan
> 
> 
> --
> Ryan Harrigan, Ph.D.
> Center for Tropical Research
> Institute of the Environment
> University of California, Los Angeles
> La Kretz Hall, Suite 300
> Box 951496
> Los Angeles, CA 90095-1496
> 203-804-9505
> iluvsa at ucla.edu
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:12}}




More information about the R-help mailing list