[R] Concern with randomForest

Liaw, Andy andy_liaw at merck.com
Tue Apr 7 14:06:19 CEST 2009


It's not nodesize in the formula, but var(y) (with divisor n, not n-1).
It's sort of like the adjusted R-squared (because it uses mean squares
instead of sum of squares), but uses the OOB estimate of MSE.  If
there's very little or no explanatory power in the predictor variables,
this statistic would be estimating a very small number (or zero), and
can come out negative.  I would interpret any negative pseudo-R^2 as
indication of very poor model.  

Andy

From: Ryan Harrigan
> Hi all,
>     When running a randomForest run using the following command:
> 
> forestplas=randomForest(Prev~.,data=plas,ntree=200000)
> print(forestplas)
> 
> I get the following result:
> 
> Call:
>  randomForest(formula = Prev ~ ., data = plas, ntree = 2e+05,
> importance = TRUE) 
>                Type of random forest: regression
>                      Number of trees: 2e+05
> No. of variables tried at each split: 5
> 
>           Mean of squared residuals: 0.0431127
>                     % Var explained: -22.1
> 
> 
> 
> Here's my concern; what is the explanation here for a negative percent
> variation explained? My understanding is that this value is 
> calculated using
> the formula;
> 
> 1-MSE(OOB)/nodesize (from Liaw & Wiener's description)
> 
> Is this analagous to an r-squared that has not been run 
> through a stepwise
> procedure? Should I be removing variables not contributing to 
> models before
> running randomForest? This negative value seems contradictory 
> to my standard
> multiple regression results which indicate up to 58% of the variation
> explained.
> 
> Thanks for you help on this, any comments are welcome!
> 
> 
> --
> Ryan Harrigan, Ph.D.
> Center for Tropical Research
> Institute of the Environment
> University of California, Los Angeles
> La Kretz Hall, Suite 300
> Box 951496
> Los Angeles, CA 90095-1496
> 203-804-9505
> iluvsa at ucla.edu
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:12}}




More information about the R-help mailing list