[R] gbm
Liaw, Andy
andy_liaw at merck.com
Thu Jan 13 01:53:52 CET 2005
> From: Weiwei Shi
>
> Hi, there:
> Thanks a lot for all people' prompt replies.
>
> In detail, I am facing a huge amount of data: over
> 10,000 and 400 vars. This project is very challenging
> and interesting to me. I tried rpart which gives me
> some promising results but not good enough. So I am
> trying randomForest and gbm now.
>
> My plan of using gbm is like this:
> rt<-rpart(...)
> gbm(formula(rt)...)
>
> Does this work? (My first question)
Given a machine with sufficient memory and CPU speed, yes.
> My another CONCERN FOR GBM is the scalability since I
> realize R seems to load all the data into memory. (My
> second question)
We have dealt with data larger than what you described. One thing to avoid
is the use of the formula interface if you have _lots_ (like, hundreds) of
variables. gbm.fit(), I believe, was created for that reason.
> But I believe the idea above will run very slowly. (I
> think I might try TreeNet, though I don't like it
> since it is commercial.). BTW, sampling might be a
> good idea, but it does not seem a good idea for my
> project from previous experiments.
To me being commercial is not a crime. I judge software on quality, ease of
use, access to source (if I need it), etc. To me, TreeNet failed on several
of those criteria, but it works just fine for some people.
> I read some reference mentioned earlier by helpers
> before I sent my first email. But I still appreciate
> any helps. You guys are so nice!
That's no excuse for not following the posting guide, right?
> BTW, gbm means gradient boosting modeling :)
No. I believe Greg calls it `generalized boosting models'.
Andy
> Ed
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
More information about the R-help
mailing list