[R-sig-eco] how to calculate "axis variance" in metaMDS, pakage vegan?

Jari Oksanen jari.oksanen at oulu.fi
Wed Dec 2 19:47:24 CET 2009


On 2/12/09 19:55 PM, "Gian Maria Niccolò Benucci" <gian.benucci at gmail.com>
wrote:
 
> ... I supposed, that If we use as many dimensions as there are variables,
> then we can perfectly reproduce the observed distance matrix. Isn't it?
Gian, Not quite so. I think it would be useful to consult a good book, but
here some explanation.

The NMDS is not a simple "reproduction" method, but it is a non-linear
regression problem. For n points and k dimensions we fit a nonlinear
regression with n*k parameters fitted to n*(n-1)/2 observations. It doesn't
require much intuition to see that this is not well defined for k
approaching n, and then the non-linear regression fails. For details, the
non-linear regression function is isoreg() in R, and the model fitting
happens with optim() using method = "BFGS" (Broyden, Fletcher, Goldfarb &
Shanno). All this is not very obvious because it is done within a C function
in the MASS package. The NMDS is nonlinear just in order to be able to
produce a good mapping with low values of k: so stick with low values of k.

If you want to have complete mapping of dissimilarities, you should use
metric scaling. Then you typically ignore the latter axes. However, even
here the situation is not as clear as you write. If you use Euclidean
distances, then the number of variables give the number of dimensions of
metric scaling. With Euclidean distances, the complete solution also exactly
reproduces the observed distances. However, with non-Euclidean
dissimilarities (like Bray-Curtis in your case) the situation is more
complicated. Metric scaling and complete mapping is Euclidean, and if your
dissimilarities are non-Euclidean, you have a problem (that you usually
ignore). Firstly, the number of above zero eigenvalues and corresponding
real eigenvalues is not directly defined by the number of variables.
Secondly, you cannot reproduce the observed dissimilarities from real
eigenvectors because that reproduction is Euclidean and your measure was
non-Euclidean. For exact reproduction, you should subtract the distances in
imaginary space (negative eigenvalues) from distances in the real space
(positive eigenvalues). We actually do it exactly like this in the
betadisper() function in vegan, and for this reason the wcmdscale() function
of vegan also returns information on complex eigenvectors and negative
eigenvalues.

For your other post that came when I wrote this: stress 11.6 is really fine.
I think that if you get stress down to 5% (0.05) or less, then there is
something fishy in your data or in your model specification, like
overfitting. 

Cheers, Jari Oksanen

> But,
> of course, our goal is to reduce the observed complexity of nature, that is,
> to explain the distance matrix in terms of fewer underlying dimensions...
> So what is best at the end??
> And also wich is the function for plotting the stress values versus the
> number of dimnsions and how to read the plot?
> I hope I was clear, thank you so much!
> Yours,
> 
> G.
> 
> [[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



More information about the R-sig-ecology mailing list