[R] why is nrow() so slow?
hadley wickham
h.wickham at gmail.com
Tue Sep 15 17:59:55 CEST 2009
On Tue, Sep 15, 2009 at 9:48 AM, ivo welch <ivowel at gmail.com> wrote:
> dear R wizards: here is the strange question for the day. It seems to me
> that nrow() is very slow. Let me explain what I mean:
>
> ds= data.frame( NA, x=rnorm(10000) ) ## a sample data set
>
>> system.time( { for (i in 1:10000) NA } ) ## doing nothing takes
> virtually no time
> user system elapsed
> 0.000 0.000 0.001
>
> ## this is something that should take time; we need to add 10,000 values
> 10,000 times
>> system.time( { for (i in 1:10000) mean(ds$x) } )
> user system elapsed
> 0.416 0.001 0.416
>
> ## alas, this should be very fast. it is just reading off an attribute of
> ds. it takes almost a quarter of the time of mean()!
>> system.time( { for (i in 1:10000) nrow(ds) } )
> user system elapsed
> 0.124 0.001 0.125
I just encountered this same problem. nrow is so slow because it
works like this:
nrow(df)
dim(df)[1]
dim.data.frame(df)[1]
c(.row_names_info(df, 2L), length(df))
If you use .row_names_info(df, 2L) directly it's about 6 times faster.
> system.time( { for (i in 1:10000) nrow(ds) })
user system elapsed
0.183 0.002 0.187
> system.time( { for (i in 1:10000) .row_names_info(ds, 2) })
user system elapsed
0.026 0.000 0.027
Hadley
--
http://had.co.nz/
More information about the R-help
mailing list