[R] why is nrow() so slow?

jim holtman jholtman at gmail.com
Tue Sep 15 23:17:29 CEST 2009


'by' works with data.frames.  Look at what happens if you don't send
in a dataframe to 'by':

> by.default
function (data, INDICES, FUN, ..., simplify = TRUE)
{
    dd <- as.data.frame(data)
    if (length(dim(data)))
        by(dd, INDICES, FUN, ..., simplify = simplify)
    else {
        if (!is.list(INDICES)) {


The 'as.data.frame' converts it to a dataframe.  Matrices are a lot
faster in many instances where you are working with 'matrix-like'
operations.

On Tue, Sep 15, 2009 at 5:12 PM, ivo welch <ivo_welch at brown.edu> wrote:
> interestingly, in my case, the opposite seems to be the case.  data frames
> seem faster than matrices when it comes to "by" computation (which is where
> most of my calculations are in):
>
> ### here is my data frame and some information about it
>> dim(rets.subset)
> [1] 132508      3
>> names(rets.subset)
> [1] "PERMNO" "RET"    "mdate"
>> length(unique(as.factor(rets.subset$PERMNO)))
> [1] 6832
>> length((as.factor(rets.subset$PERMNO)))
> [1] 132508
>
> ### calculation using data frame
>> system.time( { by( rets.subset, as.factor(rets.subset$PERMNO), mean) } )
>   user  system elapsed
>  3.295   2.798   6.095
>
> ### same as matrix
>> m=as.matrix(rets.subset)
>> system.time( { a=by( m, as.factor(m[,1]), mean) } )
>   user  system elapsed
>  5.371   5.557  10.928
>
> PS: Any speed suggestions are appreciated.  This is "experimenting time" for
> me.
>
>
>> One note:  if you're worried about speed, it almost always makes sense to
> use matrices rather than dataframes.  If you've got mixed types this is
> tedious and error-prone (each type needs to be in a separate matrix), but if
> your data is all numeric, it's very simple, and will make things a lot
> faster.
>
>
>
>
>>
>> Duncan Murdoch
>>
>
>
>
> --
> Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com)
> CV Starr Professor of Economics (Finance), Brown University
> http://welch.econ.brown.edu/
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list