[R-sig-Geo] randomForests for mapping vegetation with big data

Alberto Ruiz Moreno aruiz at eeza.csic.es
Mon Jul 31 12:37:17 CEST 2006


Hi,
 
I´m trying to ran randomForests in R to do vegetation suitability maps.
 
I´m working with 1000x1000 pixel maps and 30 environmental  variables.
 
My software is R v2.3.1 and RandomForests 4.5-16. 
1GB Ram memory and 3GB swap partition in a Linux or Windows machine
(the problem is the same in both configurations).
 
R abort with an memory limits error when I try to train randomForest with 
big data frames, 300000 rows x 30 columns and 500 trees.
 
I have lost one week tuning the use of memory in R (I read tens of messages on it) but
I think that it is not a R misconfiguration but a big memory expense of randomforests library implementation.
 
My conclusion is: I need to divide...
 
ok, to solve this memory error, I ran randomForests several times, 
with less rows in the training data, and use the combine() function to join the forests.
 
The cuestion is...
 
Is this the right way to train randomForest with big data? 
There are another way?
How do you make it?
 

thanks...




More information about the R-sig-Geo mailing list