[R] RE: Off topic -- large data sets. Experiences using R on clusters/Grids

Thomas Colson tom_colson at ncsu.edu
Wed Feb 16 14:19:53 CET 2005


Thanks for all the input. Now to go further off topic..

Does anyone have any comments regarding running 64 BIT R on cluster/grid
systems? Given an (almost) unlimited amount of memory, can R hypotheticaly
handle Very Large Datasets? 

I'm finding that even small sub sets of this data come in at 1 GB (1-5
million rows), which no R 32 BIT workstation (at least in this lab) can
handle. 


This type of stuff is done effortlessly in genomic research, mapping DNA,
etc.... 


Tom Colson
Center for Earth Observation
North Carolina State University 
Raleigh, NC 27695
(919) 515 3434
(919) 673 8023
tom_colson at ncsu.edu

Online Calendar:
http://www4.ncsu.edu/~tpcolson



-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Graham Jones
Sent: Wednesday, February 16, 2005 5:08 AM
To: Prof Brian Ripley
Cc: r-help at stat.math.ethz.ch
Subject: Re: Off topic -- large data sets. Was RE: [R] 64 Bit R
BackgroundQuestion

In message <Pine.LNX.4.61.0502151735100.31845 at gannet.stats>, Prof Brian
Ripley <ripley at stats.ox.ac.uk> writes

>But Bert's caveats apply: you have 200 problems of size 20,000 since in 
>QDA each class's distribution is estimated separately, and a single 
>pass will give you the sufficient statistics however large the dataset is.
>

I think we've interpreted Bert's question differently. I am not saying I
need to have vast amounts of data in RAM, or in a single data structure, or
anything like that, and I am not saying I need a 64-bit version of R.
What I am saying is that if I had 40 million cases for a problem like the
one I described, I'd want to use all of them when designing a classifier.

Patrick Burns, if you're reading: OCR = optical character recognition.

--
Graham Jones, author of SharpEye Music Reader http://www.visiv.co.uk 21e
Balnakeil, Durness, Lairg, Sutherland, IV27 4PT, Scotland, UK

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html


More information about the R-help mailing list