[R] Can R handle medium and large size data sets?
Martin Maechler
maechler at stat.math.ethz.ch
Wed Jan 25 10:16:36 CET 2006
>>>>> "Martin" == Martin Lam <tmlammail at yahoo.com>
>>>>> on Tue, 24 Jan 2006 12:13:07 -0800 (PST) writes:
Martin> Dear Gueorgui,
>> Is it true that R generally cannot handle medium sized
>> data sets(a couple of hundreds of thousands observations)
>> and threrefore large date set(couple of millions of
>> observations)?
Martin> It depends on what you want to do with the data sets.
Martin> Loading the data sets shouldn't be any problem I
Martin> think. But using the data sets for analysis using self
Martin> written R code can get (very) slow, since R is an
Martin> interpreted language (correct me if I'm wrong).
(Since you asked for it ;-) )
Yes, you are wrong to quite some extent (you are partially
right, too): Of course one *can* write ``self written R code''
that is very slow, and yes, we have seen such code more than
once. However, 98% of the problems {never trust a statistic
unless you mad it up ... :-) :-) } are relatively easily
solvable very efficiently with R.
You are right that it is easier to write slow code in an
interpreted language than in a compiled one.
E.g., not making use of vectorized operations in R is one famous
recipe to produce slow code pretty successfully ...
Martin> To increase speed you will often need to experiment with
Martin> the R code. For example, what I've noticed is that
Martin> processing data sets as matrices works much faster
Martin> than data.frame().
yes, indeed; see also the other answers to Gueorgui's question.
Martin> Writing your code in C(++), compile it and include
Martin> it in your R code is often the best way.
Martin> HTH,
Martin> Martin
More information about the R-help
mailing list