[R] data frame vs. matrix

Rui Barradas ruipbarradas at sapo.pt
Sun Mar 16 23:51:46 CET 2014


Hello,

This is to be expected. Matrices can hold only one type of data so the 
problem is solved once and for all, data frames can have many types of 
data so the code to handle them must determine which type to handle on 
every access.

Hope this helps,

Rui Barradas

Em 16-03-2014 18:57, Göran Broström escreveu:
> I have always known that "matrices are faster than data frames", for
> instance this function:
>
>
> dumkoll <- function(n = 1000, df = TRUE){
>      dfr <- data.frame(x = rnorm(n), y = rnorm(n))
>      if (df){
>          for (i in 2:NROW(dfr)){
>              if (!(i %% 100)) cat("i = ", i, "\n")
>              dfr$x[i] <- dfr$x[i-1]
>          }
>      }else{
>          dm <- as.matrix(dfr)
>          for (i in 2:NROW(dm)){
>              if (!(i %% 100)) cat("i = ", i, "\n")
>              dm[i, 1] <- dm[i-1, 1]
>          }
>          dfr$x <- dm[, 1]
>      }
> }
>
> --------------------
>  > system.time(dumkoll())
>
>     user  system elapsed
>    0.046   0.000   0.045
>
>  > system.time(dumkoll(df = FALSE))
>
>     user  system elapsed
>    0.007   0.000   0.008
> ----------------------
>
> OK, no big deal, but I stumbled over a data frame with one million
> records. Then, with df = TRUE,
> ----------------------------
>       user    system   elapsed
> 44677.141  1271.544 46016.754
> ----------------------------
> This is around 12 hours.
>
> With df = FALSE, it took only six seconds! About 7500 time faster.
>
> I was really surprised by the huge difference, and I wonder if this is
> to be expected, or if it is some peculiarity with my installation: I'm
> running Ubuntu 13.10 on a MacBook Pro with 8 Gb memory, R-3.0.3.
>
> Göran B.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list