[R] crossprod(x) vs crossprod(x,x)
Duncan Murdoch
murdoch at stats.uwo.ca
Tue Nov 21 13:27:22 CET 2006
On 11/21/2006 6:39 AM, Robin Hankin wrote:
> Professor Ripley
>
> thanks for this.
>
> I see what you say about gc(). I'll try your looped test on my system:
>
>
>
> > x <- matrix(rnorm(3000000),ncol=3)
> >
> > system.time(for(i in 1:100){crossprod(x)})
> [1] 7.931 20.420 28.619 0.000 0.000
> > system.time(for(i in 1:100){crossprod(x)})
> [1] 7.975 20.590 29.512 0.000 0.000
> > system.time(for(i in 1:100){crossprod(x)})
> [1] 8.003 20.641 30.160 0.000 0.000
> > system.time(for(i in 1:100){crossprod(x,x)})
> [1] 2.350 0.032 2.552 0.000 0.000
> > system.time(for(i in 1:100){crossprod(x,x)})
> [1] 2.330 0.015 2.333 0.000 0.000
> > system.time(for(i in 1:100){crossprod(x,x)})
> [1] 2.333 0.034 2.447 0.000 0.000
> >
>
>
>
> How come my findings are qualitatively different from yours?
Have you said what version of R you're using? I tried your code in an
Intel Mac with R 2.4.0, and a Windows machine with R-devel, and saw what
Brian saw. Maybe the Power Mac changes this, but that doesn't make a
lot of sense. On the other hand, it could be that an older R release is
making unnecessary copies in one case but not the other.
Duncan Murdoch
>
>
>
>
>
>
>
> > Sys.info()
>
> sysname
>
> "Darwin"
>
> release
>
> "8.8.0"
>
> version
> "Darwin Kernel Version 8.8.0: Fri Sep 8 17:18:57 PDT 2006;
> root:xnu-792.12.6.obj~1/RELEASE_PPC"
>
> nodename
>
> "octopus.noc.soton.ac.uk"
>
> machine
>
> "Power Macintosh"
>
> login
>
> "rksh"
>
> user
>
> "rksh"
> >
>
>
>
>
>
>
>
>
>
> On 21 Nov 2006, at 11:17, Prof Brian Ripley wrote:
>
>> On Tue, 21 Nov 2006, Robin Hankin wrote:
>>
>>> I found out the other day that crossprod() will take a single
>>> matrix argument;
>>> crossprod(x) notionally returns crossprod(x,x).
>> It actually says
>>
>> x, y: matrices: 'y = NULL' is taken to be the same matrix as 'x'.
>>
>> but not that it is the same as crossprod(x,x).
>>
>>> The two forms do not return identical matrices:
>>>
>>> x <- matrix(rnorm(3000000),ncol=3)
>>> M1 <- crossprod(x)
>>> M2 <- crossprod(x,x)
>>>
>>> R> max(abs(M1-M2))
>>> [1] 1.932494e-08
>>>
>>> But what really surprised me is that crossprod(x) is slower than
>>> crossprod(x,x):
>>>
>>> R> system.time(crossprod(x))
>>> [1] 0.079 0.206 0.292 0.000 0.000
>>> R> system.time(crossprod(x,x))
>>> [1] 0.035 0.001 0.041 0.000 0.000
>> That's not usually the case: your example is too small to be
>> reliable, and the large system time is suspicious. I get (in R-
>> devel, R's internal BLAS)
>>
>>> system.time(crossprod(x))
>> user system elapsed
>> 0.034 0.000 0.034
>>> system.time(crossprod(x,x))
>> user system elapsed
>> 0.044 0.001 0.044
>>
>> Such small times are subject to a lot of unrepeatability (they
>> depend on when gc()s happen and hence on the current state of the
>> gc tuning). I suggest you try running repeats for a for loop of
>> 100, e.g.
>>
>>> system.time(for(i in 1:100) crossprod(x))
>> user system elapsed
>> 3.602 0.004 4.895
>>> system.time(for(i in 1:100) crossprod(x))
>> user system elapsed
>> 3.612 0.015 3.984
>>> system.time(for(i in 1:100) crossprod(x))
>> user system elapsed
>> 3.514 0.009 4.727
>>> system.time(for(i in 1:100) crossprod(x,x))
>> user system elapsed
>> 5.636 0.013 8.963
>>> system.time(for(i in 1:100) crossprod(x,x))
>> user system elapsed
>> 5.255 0.011 10.961
>>
>> (on a heavily used dual-processor Opteron system).
>>
>> For a more realistic example
>>
>>> x <- matrix(rnorm(3000000),ncol=300)
>>> system.time(crossprod(x))
>> user system elapsed
>> 2.196 0.001 2.197
>>> system.time(crossprod(x,x))
>> user system elapsed
>> 4.609 0.020 8.972
>>
>> or using an optimized BLAS:
>>
>>> x <- matrix(rnorm(3000000),ncol=300)
>>> system.time(crossprod(x))
>> user system elapsed
>> 0.398 0.011 0.495
>>> system.time(crossprod(x,x))
>> user system elapsed
>> 0.454 0.004 0.479
>>
>> --
>> Brian D. Ripley, ripley at stats.ox.ac.uk
>> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford, Tel: +44 1865 272861 (self)
>> 1 South Parks Road, +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
> --
> Robin Hankin
> Uncertainty Analyst
> National Oceanography Centre, Southampton
> European Way, Southampton SO14 3ZH, UK
> tel 023-8059-7743
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list