[R] data frame vs. matrix
Rui Barradas
ruipbarradas at sapo.pt
Sun Mar 16 23:51:46 CET 2014
Hello,
This is to be expected. Matrices can hold only one type of data so the
problem is solved once and for all, data frames can have many types of
data so the code to handle them must determine which type to handle on
every access.
Hope this helps,
Rui Barradas
Em 16-03-2014 18:57, Göran Broström escreveu:
> I have always known that "matrices are faster than data frames", for
> instance this function:
>
>
> dumkoll <- function(n = 1000, df = TRUE){
> dfr <- data.frame(x = rnorm(n), y = rnorm(n))
> if (df){
> for (i in 2:NROW(dfr)){
> if (!(i %% 100)) cat("i = ", i, "\n")
> dfr$x[i] <- dfr$x[i-1]
> }
> }else{
> dm <- as.matrix(dfr)
> for (i in 2:NROW(dm)){
> if (!(i %% 100)) cat("i = ", i, "\n")
> dm[i, 1] <- dm[i-1, 1]
> }
> dfr$x <- dm[, 1]
> }
> }
>
> --------------------
> > system.time(dumkoll())
>
> user system elapsed
> 0.046 0.000 0.045
>
> > system.time(dumkoll(df = FALSE))
>
> user system elapsed
> 0.007 0.000 0.008
> ----------------------
>
> OK, no big deal, but I stumbled over a data frame with one million
> records. Then, with df = TRUE,
> ----------------------------
> user system elapsed
> 44677.141 1271.544 46016.754
> ----------------------------
> This is around 12 hours.
>
> With df = FALSE, it took only six seconds! About 7500 time faster.
>
> I was really surprised by the huge difference, and I wonder if this is
> to be expected, or if it is some peculiarity with my installation: I'm
> running Ubuntu 13.10 on a MacBook Pro with 8 Gb memory, R-3.0.3.
>
> Göran B.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list