[R-sig-eco] Vegan metaMDS: unusual first run stress values with large data set

Jari Oksanen jari.oksanen at oulu.fi
Wed Dec 12 19:04:37 CET 2012


Hello R-Community,

First my thanks to Ewan Isherwood who turned our attention to this issue and sent his data file to us for analysing the situation. 

It seems that the default convergence criteria are too slack in monoMDS() that was the ordination engine of metaMDS() in this case. Good news are that you can change those criteria by adding argument 'sfgrmin' to the metaMDS() call (this is documented in ?monoMDS). The following command seems to work:

> PSU.NMDS <- metaMDS(PSU.sp, k=2, sfgrmin = 1e-7, distance = "jaccard")


The default was 'sfgrmin = 1e-5' which was so slack the iteration stopped early and did not really converge close to the solution. With this option you can find that the correct stress is of magnitude 0.029 which is much lower than reported below. Moreover, the stresses of one-dimensional and two-dimensional solutions are very close to each other. (There was one outlier (P1763E) which only had one species (CHICRA) that occurred only in four other sites and distorted the results.)

I advice *against* using 'zerodist = "add"': it is not needed with monoMDS. Identical (distance = 0) sites will have identical scores if you do not use this argument. Using 'zerodist = "add"' is only necessary with MASS::isoMDS() that is unable to handle zero distances.

We have changed the default of 'sfgrmin' in http://www.r-forge.r-project.org/ so that you should not see this problem in the next vegan releases.

Cheers, Jari Oksanen

On 05/12/2012, at 21:15 PM, Ewan Isherwood wrote:

> Hello, R-Community! This is the first time writing to this group and
> indeed the first time using a mailing list, so please bear with me if
> I’ve done something wrong.
> 
> I have a large species x site matrix (89 x 4831) that I want to
> ordinate using metaMDS in the Vegan (2.0-5) package in R (2.15.2). If
> I run this data frame using the Jaccard index in two or more
> dimensions (k>1), the first run (run=0) has a relatively low stress
> value and the other 20 runs are much higher and have very low
> deviation. However, k=1 seems to work fine. Furthermore, a
> stress/scree plot reveals a pyramid-like shape, where the k=1 lowest
> stress value is low, increases rapidly for k=2 then decreases slowly
> as k increases.
> 
> Dimensions	Stress
> 1	0.1382185
> 2	0.1939509
> 3	0.1695375
> 4	0.155221
> 5	0.1406408
> 6	0.1294149
> 
> I’ve tried this with a small iteration of this data and this issue
> arises at k>2 rather than at k>1 as it is here. Anyway, this is the
> input and output:
> 
> library(vegan)
> library(MASS)
> PSU <- read.table("PSU.txt", header = TRUE, sep = "")
> PSU.sp <- PSU[, 22:110]
> PSU.NMDS <- metaMDS(PSU.sp, k=4, zerodist = "add", distance = "jaccard")
> 
> Square root transformation
> Wisconsin double standardization
> Zero dissimilarities changed into  0.0006657301
> Run 0 stress 0.155221
> Run 1 stress 0.2548103
> Run 2 stress 0.255434
> Run 3 stress 0.2551382
> … (Up to run 20 where run 1 through run 20 have all very similar stress values.)
> 
> Call:
> metaMDS(comm = PSU.sp, distance = "jaccard", k = 4, zerodist = "add")
> 
> global Multidimensional Scaling using monoMDS
> 
> Data:     wisconsin(sqrt(PSU.sp))
> Distance: jaccard
> 
> Dimensions: 4
> Stress:     0.155221
> Stress type 1, weak ties
> No convergent solutions - best solution after 20 tries
> Scaling: centring, PC rotation, halfchange scaling
> Species: expanded scores based on ‘wisconsin(sqrt(PSU.sp))’
> 
> Now, again, with k=1 this does not happen – the solution looks like
> any other regular NMDS run. There are no blank values in the data as
> they are all numbers between 0 and 100 corresponding to % cover, and
> every row and column sum is greater than 0. There are many sites with
> the same species configurations, hence the zerodist, but omitting this
> makes no difference to the problem at hand. The NMDS works fine if I
> use a subset of the data, but I have not subsetted and tested all of
> it. Other metric (Euclidean) and nonmetric (Bray) dissimilarity
> indices result in the same effect. I’ve chosen k=4 here because of the
> (marginal) elbow in the stress plot, but the data itself actually
> looks pretty good at any k value. Even though the output is
> reasonable, I am concerned that hitting the best solution by a large
> amount on the first run means something is messing up, and this
> concern is amplified by the strange pyramid shaped stress plot.
> Because metaMDS uses random starts, I don't see how this output is
> possible. I've scoured the help files and archives of this list and I
> am really now at a loss to explain this.
> 
> Thank you in advance for your time and consideration!
> 
> Ewan
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksanen at oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa



More information about the R-sig-ecology mailing list