[R-pkg-devel] mvrnorm, eigen, tests, and R CMD check
Jari Oksanen
j@ri@ok@@nen @ending from oulu@fi
Fri May 18 08:48:35 CEST 2018
I am afraid that these suggestions may not work. There are more choices than Win32 and Win64, including several flavours of BLAS/Lapack which probably are involved if you evaluate eigenvalues, and also differences in hardware, compilers and phase of the moon. If there are several equal eigenvalues, any solution of axes is arbitrary and it can be made stable for testing only by chance. If you have M equal eigenvalues, you should try to find a test that the M-dimensional (sub)space is approximately correct irrespective of random orientation of axes in this subspace.
Cheers, Jari Oksanen
On 18 May 2018, at 00:06 am, Kevin Coombes <kevin.r.coombes at gmail.com<mailto:kevin.r.coombes at gmail.com>> wrote:
Yes; but I have been running around all day without time to sit down and
try them. The suggestions make sense, and I'm looking forward to
implementing them.
On Thu, May 17, 2018, 3:55 PM Ben Bolker <bbolker at gmail.com<mailto:bbolker at gmail.com>> wrote:
There have been various comments in this thread (by me, and I think
Duncan Murdoch) about how you can identify the platform you're running
on (some combination of .Platform and/or R.Version()) and use it to
write conditional statements so that your tests will only be compared
with reference values that were generated on the same platform ... did
those get through? Did they make sense?
On Thu, May 17, 2018 at 3:30 PM, Kevin Coombes
<kevin.r.coombes at gmail.com<mailto:kevin.r.coombes at gmail.com>> wrote:
Yes; I'm pretty sure that it is exactly the repeated eigenvalues that are
the issue. The matrices I am using are all nonsingular, and the various
algorithms have no problem computing the eigenvalues correctly (up to
numerical errors that I can bound and thus account for on tests by
rounding
appropriately). But an eigenvalue of multiplicity M has an M-dimensional
eigenspace with no preferred basis. So, any M-dimensional (unitary)
change
of basis is permitted. That's what give rise to the lack of
reproducibility
across architectures. The choice of basis appears to use different
heuristics on 32-bit windows than on 64-bit Windows or Linux machines.
As a
result, I can't include the tests I'd like as part of a CRAN submission.
On Thu, May 17, 2018, 2:29 PM William Dunlap <wdunlap at tibco.com<mailto:wdunlap at tibco.com>> wrote:
Your explanation needs to be a bit more general in the case of identical
eigenvalues - each distinct eigenvalue has an associated subspace, whose
dimension is the number repeats of that eigenvalue and the eigenvectors
for
that eigenvalue are an orthonormal basis for that subspace. (With no
repeated eigenvalues this gives your 'unique up to sign'.)
E.g., for the following 5x5 matrix with two eigenvalues of 1 and two of
0
x <- tcrossprod( cbind(c(1,0,0,0,1),c(0,1,0,0,1),c(0,0,1,0,1)) )
x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 0 1
[2,] 0 1 0 0 1
[3,] 0 0 1 0 1
[4,] 0 0 0 0 0
[5,] 1 1 1 0 3
the following give valid but different (by more than sign) eigen vectors
e1 <- structure(list(values = c(4, 1, 0.999999999999999, 0,
-2.22044607159862e-16
), vectors = structure(c(-0.288675134594813, -0.288675134594813,
-0.288675134594813, 0, -0.866025403784439, 0, 0.707106781186547,
-0.707106781186547, 0, 0, 0.816496580927726, -0.408248290463863,
-0.408248290463863, 0, -6.10622663543836e-16, 0, 0, 0, -1, 0,
-0.5, -0.5, -0.5, 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values",
"vectors"), class = "eigen")
e2 <- structure(list(values = c(4, 1, 1, 0, -2.29037708937563e-16),
vectors = structure(c(0.288675134594813, 0.288675134594813,
0.288675134594813, 0, 0.866025403784438, -0.784437556312061,
0.588415847923579, 0.196021708388481, 0, 4.46410900710223e-17,
0.22654886208902, 0.566068420404321, -0.79261728249334, 0,
-1.11244069540181e-16, 0, 0, 0, -1, 0, -0.5, -0.5, -0.5,
0, 0.5), .Dim = c(5L, 5L))), .Names = c("values", "vectors"
), class = "eigen")
I.e.,
all.equal(crossprod(e1$vectors), diag(5), tol=0)
[1] "Mean relative difference: 1.407255e-15"
all.equal(crossprod(e2$vectors), diag(5), tol=0)
[1] "Mean relative difference: 3.856478e-15"
all.equal(e1$vectors %*% diag(e1$values) %*% t(e1$vectors), x, tol=0)
[1] "Mean relative difference: 1.110223e-15"
all.equal(e2$vectors %*% diag(e2$values) %*% t(e2$vectors), x, tol=0)
[1] "Mean relative difference: 9.069735e-16"
e1$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] -0.2886751 0.0000000 8.164966e-01 0 -0.5
[2,] -0.2886751 0.7071068 -4.082483e-01 0 -0.5
[3,] -0.2886751 -0.7071068 -4.082483e-01 0 -0.5
[4,] 0.0000000 0.0000000 0.000000e+00 -1 0.0
[5,] -0.8660254 0.0000000 -6.106227e-16 0 0.5
e2$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] 0.2886751 -7.844376e-01 2.265489e-01 0 -0.5
[2,] 0.2886751 5.884158e-01 5.660684e-01 0 -0.5
[3,] 0.2886751 1.960217e-01 -7.926173e-01 0 -0.5
[4,] 0.0000000 0.000000e+00 0.000000e+00 -1 0.0
[5,] 0.8660254 4.464109e-17 -1.112441e-16 0 0.5
Bill Dunlap
TIBCO Software
wdunlap tibco.com<http://tibco.com>
On Thu, May 17, 2018 at 10:14 AM, Martin Maechler <
maechler at stat.math.ethz.ch<mailto:maechler at stat.math.ethz.ch>> wrote:
Duncan Murdoch ....
on Thu, 17 May 2018 12:13:01 -0400 writes:
On 17/05/2018 11:53 AM, Martin Maechler wrote:
Kevin Coombes ... on Thu, 17
May 2018 11:21:23 -0400 writes:
[..................]
[3] Should the documentation (man page) for "eigen" or
"mvrnorm" include a warning that the results can change
from machine to machine (or between things like 32-bit and
64-bit R on the same machine) because of difference in
linear algebra modules? (Possibly including the statement
that "set.seed" won't save you.)
The problem is that most (young?) people do not read help
pages anymore.
help(eigen) has contained the following text for years,
and in spite of your good analysis of the problem you
seem to not have noticed the last semi-paragraph:
Value:
The spectral decomposition of ‘x’ is returned as a list
with components
values: a vector containing the p eigenvalues of ‘x’,
sorted in _decreasing_ order, according to ‘Mod(values)’
in the asymmetric case when they might be complex (even
for real matrices). For real asymmetric matrices the
vector will be complex only if complex conjugate pairs
of eigenvalues are detected.
vectors: either a p * p matrix whose columns contain the
eigenvectors of ‘x’, or ‘NULL’ if ‘only.values’ is
‘TRUE’. The vectors are normalized to unit length.
Recall that the eigenvectors are only defined up to a
constant: even when the length is specified they are
still only defined up to a scalar of modulus one (the
sign for real matrices).
It's not a warning but a "recall that" .. maybe because
the author already assumed that only thorough users would
read that and for them it would be a recall of something
they'd have learned *and* not entirely forgotten since
;-)
I don't think you're really being fair here: the text in
?eigen doesn't make clear that eigenvector values are not
reproducible even within the same version of R, and
there's nothing in ?mvrnorm to suggest it doesn't give
reproducible results.
Ok, I'm sorry ... I definitely did not want to be unfair.
I've always thought the remark in eigen was sufficient, but I'm
probably wrong and we should add text explaining that it
practically means that eigenvectors are only defined up to sign
switches (in the real case) and hence results depend on the
underlying {Lapack + BLAS} libraries and therefore are platform
dependent.
Even further, we could consider (optionally, by default FALSE)
using defining a deterministic scheme for postprocessing the current
output of eigen such that at least for the good cases where all
eigenspaces are 1-dimensional, the postprocessing would result
in reproducible signs, by e.g., ensuring the first non-zero
entry of each eigenvector to be positive.
MASS::mvrnorm() and mvtnorm::rmvnorm() both use "eigen",
whereas mvtnorm::rmvnorm() *does* have method = "chol" which
AFAIK does not suffer from such problems.
OTOH, the help page of MASS::mvrnorm() mentions the Cholesky
alternative but prefers eigen for better stability (without
saying more).
In spite of that, my personal recommendation would be to use
mvtnorm::rmvnorm(.., method = "chol")
{ or the 2-3 lines of R code to the same thing without an extra
package,
just using rnorm(), chol() and simple matrix operations }
because in simulations I'd expect the var-cov matrix Sigma to
be far enough away from singular for chol() to be stable.
Martin
______________________________________________
R-package-devel at r-project.org<mailto:R-package-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel
[[alternative HTML version deleted]]
______________________________________________
R-package-devel at r-project.org<mailto:R-package-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel
[[alternative HTML version deleted]]
______________________________________________
R-package-devel at r-project.org<mailto:R-package-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel
[[alternative HTML version deleted]]
More information about the R-package-devel
mailing list