[R-sig-hpc] Decision trees in R with big data

Rich Calaway richcalaway at revolutionanalytics.com
Mon Apr 14 19:30:24 CEST 2014

Revolution R Enterprise, a commercial distribution of R, includes
external memory algorithm implementations of both decision trees and
decision forests. These are geared for "tall" data--the two million
rows wouldn't be a problem, nor would two billion, but the 20,000
attributes would probably challenge them.

It's probably worth a look (and is available for free for academic use):


Hope this helps!

--Rich Calaway

On Mon, Apr 14, 2014 at 9:53 AM, Supriya Jain <sjsjsj2009 at gmail.com> wrote:
> Hi,
> I have successfully used rpart but with a few thousands rows, and a few
> hundred input attributes. When using data with ~2 million rows (instances),
> and ~20,000 input attributes (typical data sizes in my application), I get
> memory problems when using rpart.
> Does anyone know of a Decision tree algorithm that works in R with big
> data?
> Thanks!
>         [[alternative HTML version deleted]]
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

Rich Calaway
Documentation Manager
Revolution Analytics, Inc.
1505 Westlake Ave North Suite 520
Seattle, WA 98109
richcalaway at revolutionanalytics.com
ph: 206-456-6086 (direct line)

More information about the R-sig-hpc mailing list