[R] Why are big data.frames slow? What can I do to get it faster?
Marcus Jellinghaus
Marcus_Jellinghaus at gmx.de
Tue Oct 8 10:11:16 CEST 2002
I wanted to know why not-vectorized operations are slow.
Thank you for your suggestions.
I did three things:
-Beside looking at the total computation time, I analyzed the
GarbageCollection-time (gc()).
-I told R to use more memory. I use version 1.6.0 and used the command
"Rgui --min-vsize=600M --min-nsize=10M"
-I used test$Fieldname[i] instead of test[i, 6].
My results show that it saves a lot of time when I use enough memory and the
fieldnames. So thank´s a lot!
Here are the details:
Without fieldnames and without use of more memory:
GC-Time: 494Seconds, other calculations 124Seconds, Total 619Seconds.
Without fieldnames, with "Rgui --min-vsize=600M --min-nsize=10M"
GC-Time: 34Seconds, other calculations 114Seconds, Total 148Seconds.
With fieldnames, without use of more memory:
GC-Time: 0,5 Seconds, other calculations 2 Seconds, Total 2,5 Seconds.
(but long time for loading the matrix)
with fieldnames, with "Rgui --min-vsize=600M --min-nsize=10M"
GC-Time: < 1 Second, other calculations < 1 Second, Total < 1 second
Marcus Jellinghaus
Peter Dalgaard writes:
>You'll likely have to invoke the garbage collector a couple of times,
>and there might also be issues of memory growth kicking in. Once you
>get beyond some threshold, the machine starts swapping bits and pieces
>of the workspace in and out of physical memory,
Andy Liaw writes:
>If you are on Windows and using R version prior to 1.6.0, make sure R can
>use all 1GB of the ram, as the default is to use up to 256MB or physical
>RAM, which ever is smaller. In R-1.6.0, that limit is raised to the
smaller
>of 1GB and physical RAM.
[..]
>Extracting from data frame one element at a time the way you did is
>expensive. I.e., test[i, 6] is slower than test$whatever[i].
Peter Dalgaard writes:
> It's somewhat difficult to reproduce the behaviour, since you only give
> part of the code necessary (e.g. how many *columns* do you have in
> your data frame?)
> summary(test)
datetime CCY1 CCY2
Bid Ask CCYPair
Min. :2002-05-28 00:00:02 Length:500000 Length:500000 Min.
: 0.557 Min. : 0.5574 Length:500000
1st Qu.:2002-05-28 17:30:47 Mode :character Mode :character 1st
Qu.: 1.532 1st Qu.: 1.5319 Mode :character
Median :2002-05-29 14:43:02 Median
: 4.047 Median : 4.0476
Mean :2002-05-29 14:42:36 Mean
: 38.664 Mean : 38.6858
3rd Qu.:2002-05-30 10:22:30 3rd
Qu.: 32.888 3rd Qu.: 32.8891
Max. :2002-05-31 02:58:54 Max.
:182.150 Max. :182.3000
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list