[R-pkg-devel] mvrnorm, eigen, tests, and R CMD check

Duncan Murdoch murdoch@dunc@n @ending from gm@il@com
Thu May 17 18:13:01 CEST 2018

On 17/05/2018 11:53 AM, Martin Maechler wrote:
>>>>>> Kevin Coombes <kevin.r.coombes at gmail.com>
>>>>>>      on Thu, 17 May 2018 11:21:23 -0400 writes:
>      > Hi, I wrote and maintain the Thresher package. It includes
>      > code to do simulations. In the "tests" directory of the
>      > package, I do some simple simulations and run the main
>      > algorithm, then write out summaries of the results
>      > The initial submission of the package to CRAN was delayed
>      > because the "Rout.save" files matched the "Rout" files on
>      > 64-bit R but *not* on 32-bit R on Windows. After
>      > investigating, I realized that when my simulation code
>      > called "MASS::mvrnorm", I got different results from
>      > 64-bit and 32-bit versions of R on the same machine.
>      > Pushing further, I determined that this was happening
>      > because mvrnorm used "eigen" to compute the eigenvalues
>      > and eigenvectors, and "eigen" itself gave different
>      > answers in the two R versions..
>      > The underlying issue (mathematically) is that the
>      > correlation/covariance matrix I was using had repeated
>      > eigenvalues, and so there is no unique choice of basis for
>      > the associated eigenspace. This observation suggests that
>      > the issue is potentially more general than 32-bit versus
>      > 64-bit; the results will depend on the implementation of
>      > the eigen-decomposition in whatever linear algebra module
>      > is compiled along with R, so it can change from machine to
>      > machine.
>      > I "solved" (well, worked around) the immediate problem
>      > with package submission by changing the test code to not
>      > write out anything that might differ between versions.
>      > With all of that as background, here are my main
>      > questions:
>      > [1] Is there any way to put something into the "tests"
>      > directory that would allow me to use these simulations for
>      > what computer scientists call regression testing? (That
>      > is, to make sure my changes to the code haven't changed
>      > results in an unexpected way.)
>      > [2] Should there be a flag or instruction to R CMD check
>      > that says to only run or interpret this particular test on
>      > a specific version or machine? (Or is there already such a
>      > flag that I don't know about?)
>      > [3] Should the documentation (man page) for "eigen" or
>      > "mvrnorm" include a warning that the results can change
>      > from machine to machine (or between things like 32-bit and
>      > 64-bit R on the same machine) because of difference in
>      > linear algebra modules? (Possibly including the statement
>      > that "set.seed" won't save you.)
> The problem is that most (young?) people do not read help pages
> anymore.
> help(eigen) has contained the following text for years, and in
> spite of your good analysis of the problem you seem to not have
> noticed the last semi-paragraph:
>> Value:
>>       The spectral decomposition of ‘x’ is returned as a list with
>>       components
>>    values: a vector containing the p eigenvalues of ‘x’, sorted in
>>            _decreasing_ order, according to ‘Mod(values)’ in the
>>            asymmetric case when they might be complex (even for real
>>            matrices).  For real asymmetric matrices the vector will be
>>            complex only if complex conjugate pairs of eigenvalues are
>>            detected.
>>   vectors: either a p * p matrix whose columns contain the eigenvectors
>>            of ‘x’, or ‘NULL’ if ‘only.values’ is ‘TRUE’.  The vectors
>>            are normalized to unit length.
>>            Recall that the eigenvectors are only defined up to a
>>            constant: even when the length is specified they are still
>>            only defined up to a scalar of modulus one (the sign for real
>>            matrices).
> It's not a warning but a "recall that" .. maybe because the
> author already assumed that only thorough users would read that
> and for them it would be a recall of something they'd have
> learned *and* not entirely forgotten since ;-)

I don't think you're really being fair here:  the text in ?eigen doesn't 
make clear that eigenvector values are not reproducible even within the 
same version of R, and there's nothing in ?mvrnorm to suggest it doesn't 
give reproducible results.

As to the other questions:

[1] You can add additional test directories and only test them under 
your own controlled conditions:  see the --test-dir argument to R CMD 
check.  You could also make tests in the standard directory conditional 
on environment variables.

[2] You can find out details of the current machine using the .Platform 
and version variables, and make tests conditional on particular values 
of those.  I'd recommend limiting such tests to your own personal runs 
(using [1]) or not including saved output, because CRAN will run the 
tests on multiple platforms.

Duncan Murdoch

More information about the R-package-devel mailing list