[R] Varying results of sammon(), for the same data set

Mon Jan 30 14:06:40 CET 2006

On Mon, 30 Jan 2006, Ole Edsberg wrote:

> Hello,
>
> I have a data set on which I run the sammon algorithm as follows:
>
> library(MASS)
> data = read.table('problemforr.dat')

Hmm.  This is a data frame of 387 rows and 387 columns and Euclidean 
distance is used.  Squeezing 387 dims (and PCA shows these points as well 
spread in almost all those dimensions) to 2 is not a well-posed problem, 
and you should welcome the plurality of answers found.

> y = cmdscale(data, add=TRUE)
> s = sammon(data, y$points)
>
> (In case it should be relevant, I make the data available at
> http://idi.ntnu.no/~edsberg/problemforr.dat)
>
> With R 2.2.1 on Debian Sid I always get one of two solutions (stress
> 1.74288 after 10 iterations or stress 1.33629 afer 9 iterations). I
> always get the same result within the same R session, even if I read
> the data again. With R 2.2.0 on SunOS 5.9 I always get the same result
> (stress 0.13186 after 74 iterations).

Note that your subject line attributes this to sammon, but it could also 
be due to cmdscale.

On AMD64 Linux I get

> s = sammon(data, y$points)
Initial stress        : 2.21024
stress after  10 iters: 1.22268, magic = 0.092
stress after  20 iters: 0.48801, magic = 0.009
stress after  30 iters: 0.35007, magic = 0.020
stress after  40 iters: 0.24377, magic = 0.045
stress after  50 iters: 0.17343, magic = 0.021
stress after  60 iters: 0.14944, magic = 0.048
stress after  70 iters: 0.12810, magic = 0.022
stress after  80 iters: 0.12423, magic = 0.010
stress after  90 iters: 0.12191, magic = 0.118
stress after 100 iters: 0.11986, magic = 0.500

That large reduction in `magic' indicates the algorithm is having 
problems.  Without optimization (used for valgrind) I got the solution you 
quoted for Solaris 9.

However, on all four systems (AMD64 FC3 Linux, i686 FC3 Linux, Solaris and 
Windows) I tried the results were different between systems and repeatable 
by system.  I even ran under valgrind to be sure that no uninitialized 
areas were used (on FC3).

> I understand that the sammon algorithm is very sensitive to even tiny
> variations in the starting point, but the observed behaviour seems
> strange to me. Difference between machines could perhaps be explained
> by floating point portability issues, but not difference on the same
> machine, and not the fact that i get the same result within the same R
> session.

No, but then that is not reproducible, and has never been reported before. 
If for example different BLAS libraries get selected on different runs 
this would explain it.  Or it could be a Debian-Sid-specific bug in a 
shared library or compiler.

> I read in the documentation 
> (http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/sammon.html) 
> that "Further, since the configuration is only determined up to 
> rotations and reflections (by convention the centroid is at the origin), 
> the result can vary considerably from machine to machine." This doesn't 
> make sense to me.

Note that is addressing a separate issue.  For a given minimized stress 
there are multiple solutions which can be transformed into each other, and 
the help file is warning you of that.  There are also (in general) 
multiple local minima.

> If the data and the algorithm is the same, the result should be the 
> same.

Depending what you mean by 'algorithm', this is what the subject of 
numerical analysis is about.  I take it you are familiar with J. H. 
Wilkinson's classic work on the Algebraic Eigenvalue Problem?

> What differences between machines do they refer to here? Floating 
> point issues?

Any difference in the CPU/FPU or compiler or run-time environment 
(including all the dynamically linked support libraries).  Just changing 
the optimization level of the compiler changes the assembler-level 
algorithm used, and can often affect the answer of e.g. an eigenvalue 
calculation.  Rounding errors depend on whether (and when) 
extended-precision registers are used and the exact order of the 
calculations since computer arithmetic is not distributive.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595