Off topic -- large data sets. Was RE: [R] 64 Bit R Background Question

Graham Jones maillists at visiv.co.uk
Wed Feb 16 11:07:51 CET 2005


In message <Pine.LNX.4.61.0502151735100.31845 at gannet.stats>, Prof Brian
Ripley <ripley at stats.ox.ac.uk> writes

>But Bert's caveats apply: you have 200 problems of size 20,000 since in 
>QDA each class's distribution is estimated separately, and a single pass 
>will give you the sufficient statistics however large the dataset is.
>

I think we've interpreted Bert's question differently. I am not saying I
need to have vast amounts of data in RAM, or in a single data structure,
or anything like that, and I am not saying I need a 64-bit version of R.
What I am saying is that if I had 40 million cases for a problem like
the one I described, I'd want to use all of them when designing a
classifier.

Patrick Burns, if you're reading: OCR = optical character recognition.

-- 
Graham Jones, author of SharpEye Music Reader
http://www.visiv.co.uk
21e Balnakeil, Durness, Lairg, Sutherland, IV27 4PT, Scotland, UK




More information about the R-help mailing list