[R] data mining & R
Prof Brian D Ripley
ripley at stats.ox.ac.uk
Fri Sep 8 08:13:44 CEST 2000
On Fri, 8 Sep 2000, Ross Ihaka wrote:
> On Fri, Sep 08, 2000 at 10:23:34AM +0800, Mohd Zamri Murah wrote:
> > I am new to R. currenty reading a few intereting articles about data mining.
> > data mining, if I conclude right, is a method to analyze large data set.
Not really. It's about searching for structure in largish datasets,
usually with many observations per subject. Think of it as really multi-
multivariate analysis.
> > From the R-FAQ, it states that;
> >
> > R (currently) uses a _static_ memory model. This means that when it
> > starts up, it asks the operating system to reserve a fixed amount of memory
> > for it. The size of this chunk cannot be changed subsequently. Hence, it
> > can happen that not enough memory was allocated, e.g., when trying to read
> > large data sets into R.
> >
> >
> > out of curiousity, what is the upper limit of data size that R can process
> > in term or number of rows/columns or in MBytes? Or, if this limit exist, is
> > it hardware related? (e.g computer with 256MB can process more data than one
> > with 64MB)
>
> This is about to change in 1.2. Luke Tierney has rewritten the memory
> management in R so that this restriction no longer applies. On the other
> hand, the computational model used within R is really only suitable for
> data sets consisting of at most a few 10s of megabytes. The problem is
> that data sets are memory resident and some computations will copy the
> entire data set.
10Mb datasets are certainly challenging enough for the current state of
data mining methodology. We routinely use R 1.1.1 (with a few judious
operations coded in C called from R) to analyse fMRI experiments in the
10-50Mb region, on machines with 128-512Mb of RAM.
There is another part of data mining about managing the databases, and
sometimes that's needed to extract a suitable <10Mb subset from a data
warehouse.
Finally, `data mining' is a buzz phrase and not well-defined: the above
reflects what people who talk to me (e.g. as a consultant) mean by it!
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list