[R] why is nrow() so slow?

Tue Sep 15 17:59:55 CEST 2009

On Tue, Sep 15, 2009 at 9:48 AM, ivo welch <ivowel at gmail.com> wrote:
> dear R wizards:  here is the strange question for the day.  It seems to me
> that nrow() is very slow.  Let me explain what I mean:
>
> ds= data.frame( NA, x=rnorm(10000) )   ##  a sample data set
>
>> system.time( { for (i in 1:10000) NA } )   ## doing nothing takes
> virtually no time
>   user  system elapsed
>  0.000   0.000   0.001
>
> ## this is something that should take time; we need to add 10,000 values
> 10,000 times
>> system.time( { for (i in 1:10000) mean(ds$x) } )
>   user  system elapsed
>  0.416   0.001   0.416
>
> ## alas, this should be very fast.  it is just reading off an attribute of
> ds.  it takes almost a quarter of the time of mean()!
>> system.time( { for (i in 1:10000) nrow(ds) } )
>   user  system elapsed
>  0.124   0.001   0.125

I just encountered this same problem.  nrow is so slow because it
works like this:

 nrow(df)
 dim(df)[1]
 dim.data.frame(df)[1]
 c(.row_names_info(df, 2L), length(df))

If you use .row_names_info(df, 2L) directly it's about 6 times faster.

> system.time( { for (i in 1:10000) nrow(ds) })
   user  system elapsed
  0.183   0.002   0.187

> system.time( { for (i in 1:10000) .row_names_info(ds, 2) })
   user  system elapsed
  0.026   0.000   0.027

Hadley

-- 
http://had.co.nz/