[R] RandomForest, Party and Memory Management

Lorenzo Isella lorenzo.isella at gmail.com
Mon Feb 4 15:21:58 CET 2013


Dear Dennis and dear All,
It was probably not my best post.
I am running R on a Debian box (amd64 architecture) and that is why I
was surprised to see memory issues when dealing with a vector larger
than 1Gb. The memory is there, but probably it is not contiguous.
I will investigate into the matter and post again (generating an
artificial dataframe if needed).
Many thanks

Lorenzo

On 4 February 2013 00:50, Dennis Murphy <djmuser at gmail.com> wrote:
> Hi Lorenzo:
>
> On Sun, Feb 3, 2013 at 11:47 AM, Lorenzo Isella
> <lorenzo.isella at gmail.com> wrote:
>> Dear All,
>> For a data mining project, I am relying heavily on the RandomForest and
>> Party packages.
>> Due to the large size of the data set, I have often memory problems (in
>> particular with the Party package; RandomForest seems to use less memory). I
>> really have two questions at this point
>> 1) Please see how I am using the Party and RandomForest packages. Any
>> comment is welcome and useful.
>
> As noted elsewhere, the example is not reproducible so I can't help you there.
>>
>>
>>
>> myparty <- cforest(SalePrice ~ ModelID+
>>                    ProductGroup+
>>                    ProductGroupDesc+MfgYear+saledate3+saleday+
>>                    salemonth,
>>                    data = trainRF,
>> control = cforest_unbiased(mtry = 3, ntree=300, trace=TRUE))
>>
>>
>>
>>
>> rf_model <- randomForest(SalePrice ~ ModelID+
>>                     ProductGroup+
>>                     ProductGroupDesc+MfgYear+saledate3+saleday+
>>                     salemonth,
>>                     data = trainRF,na.action = na.omit,
>>    importance=TRUE, do.trace=100, mtry=3,ntree=300)
>>
>> 2) I have another question: sometimes R crashes after telling me that it is
>> unable to allocate e.g. an array of 1.5 Gb.
>> However, I have 4Gb of ram on my box, so...technically the memory is there,
>> but is there a way to enable R to use more of it?
>
> 4Gb is not a lot of RAM for data mining projects. I have twice that
> and run into memory limits on some fairly simple tasks (e.g., 2D
> tables) in large simulations with 1M or 10M runs. Part of the problem
> is that data is often copied, sometimes more than once. If you have a
> 1Gb input data frame, three copies and you're out of space. Moreover,
> copied objects need contiguous memory, and this becomes very difficult
> to achieve with large objects and limited RAM. With 4Gb RAM, you need
> to be more clever:
>
> * eliminate as many other processes that access RAM as possible (e.g.,
> no active browser)
> * think of ways to process your data in chunks (which is harder to do
> when the objective is model fitting)
> * type ?"Memory-limits"  (including the quotes) at the console for
> explanations about memory limits and a few places to look for
> potential solutions
> * look into 'big data' packages like ff or bigmemory, among others
> * if you're in an (American ?) academic institution, you can get a
> free license for Revolution R, which is supposed to be better for big
> data problems than vanilla R
>
> It's hard to be specific about potential solutions, but the above
> should broaden your perspective on the big data problem and possible
> avenues for solving it.
>
> Dennis
>>
>> Many thanks
>>
>> Lorenzo
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list