[Rd] [R] variance/mean
William Dunlap
wdunlap at tibco.com
Tue Mar 24 02:25:55 CET 2009
Oops, I was thinking backwards. This sort of
hack could avoid the Fortran aliasing rules, not
run afoul of them.
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
> -----Original Message-----
> From: r-devel-bounces at r-project.org
> [mailto:r-devel-bounces at r-project.org] On Behalf Of William Dunlap
> Sent: Monday, March 23, 2009 6:18 PM
> To: Wacek Kusnierczyk; r-devel at r-project.org
> Subject: Re: [Rd] [R] variance/mean
>
> Doesn't Fortran still require that the arguments to
> a function not alias each other (in whole or in part)?
> I could imagine that var() might call into Fortran code
> (BLAS or LAPACK). Wouldn you want to chance erroneous
> results at a high optimization level to save a bit of
> time in an unusual situation?
>
> (I could also imagine someone changing the R interpreter
> so that x and x[-length(x)] could share the same memory
> block and that could cause Fortran aliasing problems as
> well.)
>
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com
>
> > -----Original Message-----
> > From: r-devel-bounces at r-project.org
> > [mailto:r-devel-bounces at r-project.org] On Behalf Of Wacek
> Kusnierczyk
> > Sent: Monday, March 23, 2009 4:40 PM
> > To: r-devel at r-project.org
> > Cc: r-help at r-project.org; rkevinburton at charter.net; Bert Gunter
> > Subject: Re: [Rd] [R] variance/mean
> >
> >
> > (this post suggests a patch to the sources, so i allow myself
> > to divert
> > it to r-devel)
> >
> > Bert Gunter wrote:
> > > x a numeric vector, matrix or data frame.
> > > y NULL (default) or a vector, matrix or data frame with compatible
> > > dimensions to x. The default is equivalent to y = x (but
> > more efficient).
> > >
> > >
> > bert points to an interesting fragment of ?var: it suggests that
> > computing var(x) is more efficient than computing var(x,x),
> for any x
> > valid as input to var. indeed:
> >
> > set.seed(0)
> > x = matrix(rnorm(10000), 100, 100)
> >
> > library(rbenchmark)
> > benchmark(replications=1000, columns=c('test', 'elapsed'),
> > var(x),
> > var(x, x))
> > # test elapsed
> > # 1 var(x) 1.091
> > # 2 var(x, x) 2.051
> >
> > that's of course, so to speak, unreasonable: for what
> var(x) does is
> > actually computing the covariance of x and x, which should
> be the same
> > as var(x,x).
> >
> > the hack is that if y is given, there's an overhead of memory
> > allocation
> > for *both* x and y when y is given, as seen in src/main/cov.c:720+.
> > incidentally, it seems that the problem can be solved with a
> > trivial fix
> > (see the attached patch), so that
> >
> > set.seed(0)
> > x = matrix(rnorm(10000), 100, 100)
> >
> > library(rbenchmark)
> > benchmark(replications=1000, columns=c('test', 'elapsed'),
> > var(x),
> > var(x, x))
> > # test elapsed
> > # 1 var(x) 1.121
> > # 2 var(x, x) 1.107
> >
> > with the quick checks
> >
> > all.equal(var(x), var(x, x))
> > # TRUE
> >
> > all(var(x) == var(x, x))
> > # TRUE
> >
> > and for cor it seems to make cor(x,x) slightly faster than
> > cor(x), while
> > originally it was twice slower:
> >
> > # original
> > benchmark(replications=1000, columns=c('test', 'elapsed'),
> > cor(x),
> > cor(x, x))
> > # test elapsed
> > # 1 cor(x) 1.196
> > # 2 cor(x, x) 2.253
> >
> > # patched
> > benchmark(replications=1000, columns=c('test', 'elapsed'),
> > cor(x),
> > cor(x, x))
> > # test elapsed
> > # 1 cor(x) 1.207
> > # 2 cor(x, x) 1.204
> >
> > (there is a visible penalty due to an additional pointer
> > test, but it's
> > 10ms on 1000 replications with 10000 data points, which i think is
> > negligible.)
> >
> > > This is as clear as I would know how to state.
> >
> > i believe bert is right.
> >
> > however, with the above fix, this can now be rewritten as:
> >
> > "
> > x: a numeric vector, matrix or data frame.
> > y: a vector, matrix or data frame with dimensions compatible
> > to those of x.
> > By default, y = x.
> > "
> >
> > which, to my simple mind, is even more clear than what bert
> would know
> > how to state, and less likely to cause the sort of confusion that
> > originated this thread.
> >
> > the attached patch suggests modifications to src/main/cov.c and
> > src/library/stats/man/cor.Rd.
> > it has been prepared and checked as follows:
> >
> > svn co https://svn.r-project.org/R/trunk trunk
> > cd trunk
> > # edited the sources
> > svn diff > cov.diff
> > svn revert -R src
> > patch -p0 < cov.diff
> >
> > tools/rsync-recommended
> > ./configure
> > make
> > make check
> > bin/R
> > # subsequent testing within R
> >
> > if you happen to consider this patch for a commit, please be sure to
> > examine and test it carefully first.
> >
> > vQ
> >
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list