[R-pkg-devel] mvrnorm, eigen, tests, and R CMD check

Thu May 17 23:06:19 CEST 2018

Yes; but I have been running around all day without time to sit down and
try them. The suggestions make sense, and I'm looking forward to
implementing them.

On Thu, May 17, 2018, 3:55 PM Ben Bolker <bbolker at gmail.com> wrote:

> There have been various comments in this thread (by me, and I think
> Duncan Murdoch) about how you can identify the platform you're running
> on (some combination of .Platform and/or R.Version()) and use it to
> write conditional statements so that your tests will only be compared
> with reference values that were generated on the same platform ... did
> those get through?  Did they make sense?
>
> On Thu, May 17, 2018 at 3:30 PM, Kevin Coombes
> <kevin.r.coombes at gmail.com> wrote:
> > Yes; I'm pretty sure that it is exactly the repeated eigenvalues that are
> > the issue. The matrices I am using are all nonsingular, and the various
> > algorithms have no problem computing the eigenvalues correctly (up to
> > numerical errors that I can bound and thus account for on tests by
> rounding
> > appropriately). But an eigenvalue of multiplicity M has an M-dimensional
> > eigenspace with no preferred basis. So, any M-dimensional  (unitary)
> change
> > of basis is permitted. That's what give rise to the lack of
> reproducibility
> > across architectures. The choice of basis appears to use different
> > heuristics on 32-bit windows than on 64-bit Windows or Linux machines.
> As a
> > result, I can't include the tests I'd like as part of a CRAN submission.
> >
> > On Thu, May 17, 2018, 2:29 PM William Dunlap <wdunlap at tibco.com> wrote:
> >
> >> Your explanation needs to be a bit more general in the case of identical
> >> eigenvalues - each distinct eigenvalue has an associated subspace, whose
> >> dimension is the number repeats of that eigenvalue and the eigenvectors
> for
> >> that eigenvalue are an orthonormal basis for that subspace.  (With no
> >> repeated eigenvalues this gives your 'unique up to sign'.)
> >>
> >> E.g., for the following 5x5 matrix with two eigenvalues of 1 and two of
> 0
> >>
> >>   > x <- tcrossprod( cbind(c(1,0,0,0,1),c(0,1,0,0,1),c(0,0,1,0,1)) )
> >>   > x
> >>        [,1] [,2] [,3] [,4] [,5]
> >>   [1,]    1    0    0    0    1
> >>   [2,]    0    1    0    0    1
> >>   [3,]    0    0    1    0    1
> >>   [4,]    0    0    0    0    0
> >>   [5,]    1    1    1    0    3
> >> the following give valid but different (by more than sign) eigen vectors
> >>
> >> e1 <- structure(list(values = c(4, 1, 0.999999999999999, 0,
> >> -2.22044607159862e-16
> >> ), vectors = structure(c(-0.288675134594813, -0.288675134594813,
> >> -0.288675134594813, 0, -0.866025403784439, 0, 0.707106781186547,
> >> -0.707106781186547, 0, 0, 0.816496580927726, -0.408248290463863,
> >> -0.408248290463863, 0, -6.10622663543836e-16, 0, 0, 0, -1, 0,
> >> -0.5, -0.5, -0.5, 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values",
> >> "vectors"), class = "eigen")
> >> e2 <- structure(list(values = c(4, 1, 1, 0, -2.29037708937563e-16),
> >>     vectors = structure(c(0.288675134594813, 0.288675134594813,
> >>     0.288675134594813, 0, 0.866025403784438, -0.784437556312061,
> >>     0.588415847923579, 0.196021708388481, 0, 4.46410900710223e-17,
> >>     0.22654886208902, 0.566068420404321, -0.79261728249334, 0,
> >>     -1.11244069540181e-16, 0, 0, 0, -1, 0, -0.5, -0.5, -0.5,
> >>     0, 0.5), .Dim = c(5L, 5L))), .Names = c("values", "vectors"
> >> ), class = "eigen")
> >>
> >> I.e.,
> >> > all.equal(crossprod(e1$vectors), diag(5), tol=0)
> >> [1] "Mean relative difference: 1.407255e-15"
> >> > all.equal(crossprod(e2$vectors), diag(5), tol=0)
> >> [1] "Mean relative difference: 3.856478e-15"
> >> > all.equal(e1$vectors %*% diag(e1$values) %*% t(e1$vectors), x, tol=0)
> >> [1] "Mean relative difference: 1.110223e-15"
> >> > all.equal(e2$vectors %*% diag(e2$values) %*% t(e2$vectors), x, tol=0)
> >> [1] "Mean relative difference: 9.069735e-16"
> >>
> >> > e1$vectors
> >>            [,1]       [,2]          [,3] [,4] [,5]
> >> [1,] -0.2886751  0.0000000  8.164966e-01    0 -0.5
> >> [2,] -0.2886751  0.7071068 -4.082483e-01    0 -0.5
> >> [3,] -0.2886751 -0.7071068 -4.082483e-01    0 -0.5
> >> [4,]  0.0000000  0.0000000  0.000000e+00   -1  0.0
> >> [5,] -0.8660254  0.0000000 -6.106227e-16    0  0.5
> >> > e2$vectors
> >>           [,1]          [,2]          [,3] [,4] [,5]
> >> [1,] 0.2886751 -7.844376e-01  2.265489e-01    0 -0.5
> >> [2,] 0.2886751  5.884158e-01  5.660684e-01    0 -0.5
> >> [3,] 0.2886751  1.960217e-01 -7.926173e-01    0 -0.5
> >> [4,] 0.0000000  0.000000e+00  0.000000e+00   -1  0.0
> >> [5,] 0.8660254  4.464109e-17 -1.112441e-16    0  0.5
> >>
> >>
> >>
> >>
> >>
> >> Bill Dunlap
> >> TIBCO Software
> >> wdunlap tibco.com
> >>
> >> On Thu, May 17, 2018 at 10:14 AM, Martin Maechler <
> >> maechler at stat.math.ethz.ch> wrote:
> >>
> >>> >>>>> Duncan Murdoch ....
> >>> >>>>>     on Thu, 17 May 2018 12:13:01 -0400 writes:
> >>>
> >>>     > On 17/05/2018 11:53 AM, Martin Maechler wrote:
> >>>     >>>>>>> Kevin Coombes ... on Thu, 17
> >>>     >>>>>>> May 2018 11:21:23 -0400 writes:
> >>>
> >>>     >>    [..................]
> >>>
> >>>     >> > [3] Should the documentation (man page) for "eigen" or
> >>>     >> > "mvrnorm" include a warning that the results can change
> >>>     >> > from machine to machine (or between things like 32-bit and
> >>>     >> > 64-bit R on the same machine) because of difference in
> >>>     >> > linear algebra modules? (Possibly including the statement
> >>>     >> > that "set.seed" won't save you.)
> >>>
> >>>     >> The problem is that most (young?) people do not read help
> >>>     >> pages anymore.
> >>>     >>
> >>>     >> help(eigen) has contained the following text for years,
> >>>     >> and in spite of your good analysis of the problem you
> >>>     >> seem to not have noticed the last semi-paragraph:
> >>>     >>
> >>>     >>> Value:
> >>>     >>>
> >>>     >>> The spectral decomposition of ‘x’ is returned as a list
> >>>     >>> with components
> >>>     >>>
> >>>     >>> values: a vector containing the p eigenvalues of ‘x’,
> >>>     >>> sorted in _decreasing_ order, according to ‘Mod(values)’
> >>>     >>> in the asymmetric case when they might be complex (even
> >>>     >>> for real matrices).  For real asymmetric matrices the
> >>>     >>> vector will be complex only if complex conjugate pairs
> >>>     >>> of eigenvalues are detected.
> >>>     >>>
> >>>     >>> vectors: either a p * p matrix whose columns contain the
> >>>     >>> eigenvectors of ‘x’, or ‘NULL’ if ‘only.values’ is
> >>>     >>> ‘TRUE’.  The vectors are normalized to unit length.
> >>>     >>>
> >>>     >>> Recall that the eigenvectors are only defined up to a
> >>>     >>> constant: even when the length is specified they are
> >>>     >>> still only defined up to a scalar of modulus one (the
> >>>     >>> sign for real matrices).
> >>>     >>
> >>>     >> It's not a warning but a "recall that" .. maybe because
> >>>     >> the author already assumed that only thorough users would
> >>>     >> read that and for them it would be a recall of something
> >>>     >> they'd have learned *and* not entirely forgotten since
> >>>     >> ;-)
> >>>     >>
> >>>
> >>>     > I don't think you're really being fair here: the text in
> >>>     > ?eigen doesn't make clear that eigenvector values are not
> >>>     > reproducible even within the same version of R, and
> >>>     > there's nothing in ?mvrnorm to suggest it doesn't give
> >>>     > reproducible results.
> >>>
> >>> Ok, I'm sorry ... I definitely did not want to be unfair.
> >>>
> >>> I've always thought the remark in eigen was sufficient,  but I'm
> >>> probably wrong and we should add text explaining that it
> >>> practically means that eigenvectors are only defined up to sign
> >>> switches (in the real case) and hence results depend on the
> >>> underlying {Lapack + BLAS} libraries and therefore are platform
> >>> dependent.
> >>>
> >>> Even further, we could consider (optionally, by default FALSE)
> >>> using defining a deterministic scheme for postprocessing the current
> >>> output of eigen such that at least for the good cases where all
> >>> eigenspaces are 1-dimensional, the postprocessing would result
> >>> in reproducible signs, by e.g., ensuring the first non-zero
> >>> entry of each eigenvector to be positive.
> >>>
> >>> MASS::mvrnorm()  and  mvtnorm::rmvnorm() both use "eigen",
> >>> whereas mvtnorm::rmvnorm()  *does* have  method = "chol" which
> >>> AFAIK does not suffer from such problems.
> >>>
> >>> OTOH, the help page of MASS::mvrnorm() mentions the Cholesky
> >>> alternative but prefers eigen for better stability (without
> >>> saying more).
> >>>
> >>> In spite of that, my personal recommendation would be to use
> >>>
> >>>   mvtnorm::rmvnorm(.., method = "chol")
> >>>
> >>> { or the 2-3 lines of R code to the same thing without an extra
> package,
> >>>   just using rnorm(), chol() and simple matrix operations }
> >>>
> >>> because in simulations I'd expect the var-cov matrix  Sigma to
> >>> be far enough away from singular for chol() to be stable.
> >>>
> >>> Martin
> >>>
> >>> ______________________________________________
> >>> R-package-devel at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> >>>
> >>
> >>
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-package-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

	[[alternative HTML version deleted]]