[R] randomForest memory footprint
John Foreman
john.4man at gmail.com
Wed Sep 7 20:45:59 CEST 2011
Hello, I am attempting to train a random forest model using the
randomForest package on 500,000 rows and 8 columns (7 predictors, 1
response). The data set is the first block of data from the UCI
Machine Learning Repo dataset "Record Linkage Comparison Patterns"
with the slight modification that I dropped two columns with lots of
NA's and I used knn imputation to fill in other gaps.
When I load in my dataset, R uses no more than 100 megs of RAM. I'm
running a 64-bit R with ~4 gigs of RAM available. When I execute the
randomForest() function, however I get memory complaints. Example:
> summary(mydata1.clean[,3:10])
cmp_fname_c1 cmp_lname_c1 cmp_sex cmp_bd
cmp_bm cmp_by cmp_plz is_match
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
Min. :0.0000 Min. :0.0000 Min. :0.00000 FALSE:572820
1st Qu.:0.2857 1st Qu.:0.1000 1st Qu.:1.0000 1st Qu.:0.0000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 TRUE : 2093
Median :1.0000 Median :0.1818 Median :1.0000 Median :0.0000
Median :0.0000 Median :0.0000 Median :0.00000
Mean :0.7127 Mean :0.3156 Mean :0.9551 Mean :0.2247
Mean :0.4886 Mean :0.2226 Mean :0.00549
3rd Qu.:1.0000 3rd Qu.:0.4286 3rd Qu.:1.0000 3rd Qu.:0.0000
3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.00000
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
Max. :1.0000 Max. :1.0000 Max. :1.00000
> mydata1.rf.model2 <- randomForest(x = mydata1.clean[,3:9],y=mydata1.clean[,10],ntree=100)
Error: cannot allocate vector of size 877.2 Mb
In addition: Warning messages:
1: In dim(data) <- dim :
Reached total allocation of 3992Mb: see help(memory.size)
2: In dim(data) <- dim :
Reached total allocation of 3992Mb: see help(memory.size)
3: In dim(data) <- dim :
Reached total allocation of 3992Mb: see help(memory.size)
4: In dim(data) <- dim :
Reached total allocation of 3992Mb: see help(memory.size)
Other techniques such as boosted trees handle the data size just fine.
Are there any parameters I can adjust such that I can use a value of
100 or more for ntree?
Thanks,
John
More information about the R-help
mailing list