[R] Concern with randomForest

Ryan Harrigan iluvsa at ucla.edu
Tue Apr 7 05:35:55 CEST 2009

Hi all,
    When running a randomForest run using the following command:


I get the following result:

 randomForest(formula = Prev ~ ., data = plas, ntree = 2e+05,
importance = TRUE) 
               Type of random forest: regression
                     Number of trees: 2e+05
No. of variables tried at each split: 5

          Mean of squared residuals: 0.0431127
                    % Var explained: -22.1

Here's my concern; what is the explanation here for a negative percent
variation explained? My understanding is that this value is calculated using
the formula;

1-MSE(OOB)/nodesize (from Liaw & Wiener's description)

Is this analagous to an r-squared that has not been run through a stepwise
procedure? Should I be removing variables not contributing to models before
running randomForest? This negative value seems contradictory to my standard
multiple regression results which indicate up to 58% of the variation

Thanks for you help on this, any comments are welcome!

Ryan Harrigan, Ph.D.
Center for Tropical Research
Institute of the Environment
University of California, Los Angeles
La Kretz Hall, Suite 300
Box 951496
Los Angeles, CA 90095-1496
iluvsa at ucla.edu

More information about the R-help mailing list