[R] Varying results of sammon(), for the same data set
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Jan 30 14:06:40 CET 2006
On Mon, 30 Jan 2006, Ole Edsberg wrote:
> Hello,
>
> I have a data set on which I run the sammon algorithm as follows:
>
> library(MASS)
> data = read.table('problemforr.dat')
Hmm. This is a data frame of 387 rows and 387 columns and Euclidean
distance is used. Squeezing 387 dims (and PCA shows these points as well
spread in almost all those dimensions) to 2 is not a well-posed problem,
and you should welcome the plurality of answers found.
> y = cmdscale(data, add=TRUE)
> s = sammon(data, y$points)
>
> (In case it should be relevant, I make the data available at
> http://idi.ntnu.no/~edsberg/problemforr.dat)
>
> With R 2.2.1 on Debian Sid I always get one of two solutions (stress
> 1.74288 after 10 iterations or stress 1.33629 afer 9 iterations). I
> always get the same result within the same R session, even if I read
> the data again. With R 2.2.0 on SunOS 5.9 I always get the same result
> (stress 0.13186 after 74 iterations).
Note that your subject line attributes this to sammon, but it could also
be due to cmdscale.
On AMD64 Linux I get
> s = sammon(data, y$points)
Initial stress : 2.21024
stress after 10 iters: 1.22268, magic = 0.092
stress after 20 iters: 0.48801, magic = 0.009
stress after 30 iters: 0.35007, magic = 0.020
stress after 40 iters: 0.24377, magic = 0.045
stress after 50 iters: 0.17343, magic = 0.021
stress after 60 iters: 0.14944, magic = 0.048
stress after 70 iters: 0.12810, magic = 0.022
stress after 80 iters: 0.12423, magic = 0.010
stress after 90 iters: 0.12191, magic = 0.118
stress after 100 iters: 0.11986, magic = 0.500
That large reduction in `magic' indicates the algorithm is having
problems. Without optimization (used for valgrind) I got the solution you
quoted for Solaris 9.
However, on all four systems (AMD64 FC3 Linux, i686 FC3 Linux, Solaris and
Windows) I tried the results were different between systems and repeatable
by system. I even ran under valgrind to be sure that no uninitialized
areas were used (on FC3).
> I understand that the sammon algorithm is very sensitive to even tiny
> variations in the starting point, but the observed behaviour seems
> strange to me. Difference between machines could perhaps be explained
> by floating point portability issues, but not difference on the same
> machine, and not the fact that i get the same result within the same R
> session.
No, but then that is not reproducible, and has never been reported before.
If for example different BLAS libraries get selected on different runs
this would explain it. Or it could be a Debian-Sid-specific bug in a
shared library or compiler.
> I read in the documentation
> (http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/sammon.html)
> that "Further, since the configuration is only determined up to
> rotations and reflections (by convention the centroid is at the origin),
> the result can vary considerably from machine to machine." This doesn't
> make sense to me.
Note that is addressing a separate issue. For a given minimized stress
there are multiple solutions which can be transformed into each other, and
the help file is warning you of that. There are also (in general)
multiple local minima.
> If the data and the algorithm is the same, the result should be the
> same.
Depending what you mean by 'algorithm', this is what the subject of
numerical analysis is about. I take it you are familiar with J. H.
Wilkinson's classic work on the Algebraic Eigenvalue Problem?
> What differences between machines do they refer to here? Floating
> point issues?
Any difference in the CPU/FPU or compiler or run-time environment
(including all the dynamically linked support libraries). Just changing
the optimization level of the compiler changes the assembler-level
algorithm used, and can often affect the answer of e.g. an eigenvalue
calculation. Rounding errors depend on whether (and when)
extended-precision registers are used and the exact order of the
calculations since computer arithmetic is not distributive.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list