[R] Questions on RandomForest

Liaw, Andy andy_liaw at merck.com
Wed Jan 7 14:11:30 CET 2004


Fucang,

Questions like these that are specific to one package are best addressed
directly to the package maintainer(s) first (me in this case), as the
discussion is unlikely to be of general interest to the whole list.

1.  The contituent classifier in randomForest uses the CART algorithm
(suitably modified for randomForest), based on Leo Breiman's Fortran code.
I believe the gut of rpart is written in C by Terry Therneau.

2.  There's no built-in functionality for randomForest (or most other
algorithms, for that matter) to detect "outliers".

3.  The predict() function will need to have the entire forest in memory, in
addition to the test set data.  There's nothing wrong with predicting the
test set in pieces.  I routinely do predictions on test sets with > 800,000
cases, but in pieces of sizes 10,000-50,000.

HTH,
Andy

> From: Fucang Jia
> 
> Hi, erveryone,
> 
> I show much thanks to Andy and Matthew on former questions. I 
> now sample 
> only a small segment of a image can segment the image into 
> several classes 
> by RandomForest successfully. Now I have some confusion on it:
> 
> 1.  What is the internal component classifier in 
> RandomForest? Are they the 
> CART implemented in the rpart package?
> 
> 2. I use training samples to predict new samples. But in the 
> population, if 
> I sample not the whole components, but several components I 
> am interested, 
> can randomforest not classify the non-similar components in 
> the testing 
> samples, that is to say, label them as outliers?
> 
> 3. When random forest is used to predict, the testing samples 
> should be no 
> contribution to the classifiers(which should be done). So I 
> think the memory 
> usage should not increase much, but when I use RF to predict 
> a 256*256*141 
> samples by 1329 samples (3 variables), on a SGI Octane2 with 
> 2Giga RAM, it 
> runs out of memory. Then I have to segment the big dataset 
> into two, one is 
> 256*256*70, and the other is 256*256*71. Why do RF consume so 
> much memory in 
> the prediction? Does it produce other things other than class label?
> 
> 
> Thank you very much!
> 
> Fucang
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 


------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments,...{{dropped}}




More information about the R-help mailing list