[R] data mining & R

Prof Brian D Ripley ripley at stats.ox.ac.uk
Fri Sep 8 08:13:44 CEST 2000


On Fri, 8 Sep 2000, Ross Ihaka wrote:

> On Fri, Sep 08, 2000 at 10:23:34AM +0800, Mohd Zamri Murah wrote:
> > I am new to R. currenty reading a few intereting articles about data mining.
> > data mining, if I conclude right, is a method to analyze large data set.

Not really.  It's about searching for structure in largish datasets,
usually with many observations per subject.  Think of it as really multi-
multivariate analysis.

> > From the R-FAQ, it states that;
> > 
> >    R (currently) uses a _static_ memory model.  This means that when it
> >    starts up, it asks the operating system to reserve a fixed amount of memory
> >    for it.  The size of this chunk cannot be changed subsequently.  Hence, it
> >    can happen that not enough memory was allocated, e.g., when trying to read
> >    large data sets into R.
> >    
> > 
> > out of curiousity, what is the upper limit of data size that R can process
> > in term or number of rows/columns or in MBytes? Or, if this limit exist, is
> > it hardware related? (e.g computer with 256MB can process more data than one
> > with 64MB) 
> 
> This is about to change in 1.2.  Luke Tierney has rewritten the memory
> management in R so that this restriction no longer applies.  On the other
> hand, the computational model used within R is really only suitable for
> data sets consisting of at most a few 10s of megabytes.  The problem is
> that data sets are memory resident and some computations will copy the
> entire data set.

10Mb datasets are certainly challenging enough for the current state of
data mining methodology. We routinely use R 1.1.1 (with a few judious
operations coded in C called from R) to analyse fMRI experiments in the
10-50Mb region, on machines with 128-512Mb of RAM.

There is another part of data mining about managing the databases, and
sometimes that's needed to extract a suitable <10Mb subset from a data
warehouse.

Finally, `data mining' is a buzz phrase and not well-defined: the above
reflects what people who talk to me (e.g. as a consultant) mean by it!

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list