[R] Concern with randomForest

Ryan Harrigan iluvsa at ucla.edu
Tue Apr 7 05:35:55 CEST 2009


Hi all,
    When running a randomForest run using the following command:

forestplas=randomForest(Prev~.,data=plas,ntree=200000)
print(forestplas)

I get the following result:

Call:
 randomForest(formula = Prev ~ ., data = plas, ntree = 2e+05,
importance = TRUE) 
               Type of random forest: regression
                     Number of trees: 2e+05
No. of variables tried at each split: 5

          Mean of squared residuals: 0.0431127
                    % Var explained: -22.1



Here's my concern; what is the explanation here for a negative percent
variation explained? My understanding is that this value is calculated using
the formula;

1-MSE(OOB)/nodesize (from Liaw & Wiener's description)

Is this analagous to an r-squared that has not been run through a stepwise
procedure? Should I be removing variables not contributing to models before
running randomForest? This negative value seems contradictory to my standard
multiple regression results which indicate up to 58% of the variation
explained.

Thanks for you help on this, any comments are welcome!


--
Ryan Harrigan, Ph.D.
Center for Tropical Research
Institute of the Environment
University of California, Los Angeles
La Kretz Hall, Suite 300
Box 951496
Los Angeles, CA 90095-1496
203-804-9505
iluvsa at ucla.edu




More information about the R-help mailing list