[R] Can't seem to finish a randomForest.... Just goes and goe s!
David L. Van Brunt, Ph.D.
dvanbrunt at well-wired.com
Mon Apr 5 02:14:42 CEST 2004
Thanks for the pointer!! Can't believe you got back to me so quickly on a
Sunday evening. I'll give that a shot and let you know how it goes.
On 4/4/04 19:07, "Liaw, Andy" <andy_liaw at merck.com> wrote:
> When you have fairly large data, _do not use the formula interface_, as a
> couple of copies of the data would be made. Try simply:
>
> Myforest.rf <- randomForest(Mydata[, -46], Mydata[,46],
> ntrees=100, mtry=7)
>
> [Note that you don't need to set proximity (not proximities) or importance
> to FALSE, as that's the default already.]
>
> You might also want to use do.trace=1 to see if trees are actually being
> grown (assuming there's no output buffering as in Rgui on Windows, otherwise
> you'll probably want to turn that off).
>
> I had run randomForest on data set much larger than that, without problem,
> so I don't imagine your data would be `difficult'. (I have not used the
> Mac, though.)
>
> Andy
>
>> From: David L. Van Brunt, Ph.D.
>>
>> Playing with randomForest, samples run fine. But on real data, no go.
>>
>> Here's the setup: OS X, same behavior whether I'm using
>> R-Aqua 1.8.1 or the
>> Fink compile-of-my-own with X-11, R version 1.8.1.
>>
>> This is on OS X 10.3 (aka "Panther"), G4 800Mhz with 512M
>> physical RAM.
>>
>> I have not altered the Startup options of R.
>>
>> Data set is read in from a text file with "read.table", and
>> has 46 variables
>> and 1,855 cases. Trying the following:
>>
>> The DV is categorical, 0 or 1. Most of the IV's are either
>> continuous, or
>> correctly read in as factors. The largest factor has 30
>> levels.... Only the
>> DV seems to need identifying as a factor to force class trees over
>> regresssion:
>>
>>> Mydata$V46<-as.factor(Mydata$V46)
>>> Myforest.rf<-randomForest(V46~.,data=Mydata,ntrees=100,mtry=7
> ,proximities=FALSE
>> , importance=FALSE)
>>
>> 5 hours later, R.bin was still taking up 75% of my processor.
>> When I've
>> tried this with larger data, I get errors referring to the
>> buffer (sorry,
>> not in front of me right now).
>>
>> Any ideas on this? The data don't seem horrifically large.
>> Seems like there
>> are a few options for setting memory size, but I'm not sure
>> which of them
>> to try tweaking, or if that's even the issue.
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>>
>>
>
>
> ------------------------------------------------------------------------------
> Notice: This e-mail message, together with any attachments, contains
> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New
> Jersey, USA 08889), and/or its affiliates (which may be known outside the
> United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as
> Banyu) that may be confidential, proprietary copyrighted and/or legally
> privileged. It is intended solely for the use of the individual or entity
> named on this message. If you are not the intended recipient, and have
> received this message in error, please notify us immediately by reply e-mail
> and then delete it from your system.
> ------------------------------------------------------------------------------
--
David L. Van Brunt, Ph.D.
Outlier Consulting & Development
mailto: <ocd at well-wired.com>
More information about the R-help
mailing list