[R] Matrix oriented computing

Fri Aug 26 18:15:40 CEST 2005

Prof. Ripley,

Excellent point. Neither sapply() nor outer() are the "elephant in the
room" in this situation.

On Fri, 2005-08-26 at 16:55 +0100, Prof Brian Ripley wrote:
> Try profiling.  Doing this many times to get an overview, e.g. for sapply 
> with df=1:1000:
> 
>     %       self        %       total
>   self     seconds    total    seconds    name
>   98.26      6.78     98.26      6.78     "FUN"
>    0.58      0.04      0.58      0.04     "unlist"
>    0.29      0.02      0.87      0.06     "as.vector"
>    0.29      0.02      0.58      0.04     "names<-"
>    0.29      0.02      0.29      0.02     "names<-.default"
>    0.29      0.02      0.29      0.02     "names"
> 
> so almost all the time is in qchisq.
> 
> 
> On Fri, 26 Aug 2005, Marc Schwartz (via MN) wrote:
> 
> > On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote:
> >> Marc Schwartz <MSchwartz at mn.rr.com> writes:
> >>
> >>> x <- c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9,
> >>>        0.95, 0.975, 0.99, 0.995)
> >>>
> >>> df <- c(1:100)
> >>>
> >>> mat <- sapply(x, qchisq, df)
> >>>
> >>>> dim(mat)
> >>> [1] 100  11
> >>>
> >>>> str(mat)
> >>>  num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ...
> >>
> >> outer() is perhaps a more natural first try... It does give the
> >> transpose of the sapply approach, though.
> >>
> >> round(t(outer(x,df,qchisq)),2)
> >>
> >> should be close. You should likely add dimnames.
> >
> >
> >
> > What I find interesting, is that I would have intuitively expected
> > outer() to be faster than sapply().  However:
> >
> >
> >>  system.time(mat <- sapply(x, qchisq, df), gcFirst = TRUE)
> > [1] 0.01 0.00 0.01 0.00 0.00
> >
> >>  system.time(mat1 <- round(t(outer(x, df, qchisq)), 2),
> >               gcFirst = TRUE)
> > [1] 0.01 0.00 0.01 0.00 0.00
> >
> > # No round() or t() to test for overhead
> >>  system.time(mat2 <- outer(x, df, qchisq), gcFirst = TRUE)
> > [1] 0.01 0.00 0.02 0.00 0.00
> >
> >
> > # Bear in mind the round() on mat1 above
> >> all.equal(mat, mat1)
> > [1] "Mean relative  difference: 4.905485e-05"
> >
> >> all.equal(mat, t(mat2))
> > [1] TRUE
> >
> >
> > Even when increasing the size of 'df' to 1:1000:
> >
> >
> >>  system.time(mat <- sapply(x, qchisq, df), gcFirst = TRUE)
> > [1] 0.16 0.01 0.16 0.00 0.00
> >
> >>  system.time(mat1 <- round(t(outer(x, df, qchisq)), 2), gcFirst =
> > TRUE)
> > [1] 0.16 0.00 0.18 0.00 0.00
> >
> >>  # No round() or t() to test for overhead
> >>  system.time(mat2 <- outer(x, df, qchisq), gcFirst = TRUE)
> > [1] 0.16 0.01 0.17 0.00 0.00
> >
> >
> >
> > It also seems that, at least in this case, t() and round() do not add
> > much overhead.
> 
> Definitely not for such small matrices.

True and both are C functions, which of course helps as well.

Best regards,

Marc