[R-sig-eco] Follow-up to Vegan metaMDS: unusual first run stress values with large data set

Jari Oksanen jari.oksanen at oulu.fi
Tue May 3 22:17:23 CEST 2016


Ewan,

You already got some good hints from Peter Michin (outside this list), but here some comments.

> On 2 May 2016, at 20:22 pm, Ewan Isherwood <ewan.isherwood at gmail.com> wrote:
> 
> Hi r-sig-ecology!
> 
> This is mostly a message for Jari Oksanen or another Vegan developer that
> may be working specifically with metaMDS, but I'm opening up to anyone that
> has any interest in this. First of all, you can see my original post here:
> http://r-sig-ecology.471788.n2.nabble.com/Vegan-metaMDS-unusual-first-run-stress-values-with-large-data-set-td7577720.html
> 
> Basically, I'm having the same issues with the metaMDS engine as above (R
> 3.2.5, Vegan 2.3-5). This time my dataset is larger at 9239 sites x 85
> species.
> 
> I've tried adjusting the sfgrmin value up to an absurd 1e-10,000,000
> (decreasing this value to -7 resolved the issue last time)
> 
> I've tried upping the sratmax to about 0.99 recurring with 77 9's (I don't
> think this should have an effect since it's concerned with the iterations
> stopping when the stress ratio between two iterations goes above the
> inputted value)
> 

Data sets with this size can really be difficult for NMDS because location of any point has tinies effect on the stress. In this kind of situations the first thing is to find why the analysis stops. There are three stopping criteria and you have changed two (steepness of gradient sfgrmin and the change in stress sratmax). Your changes are really absurd since they exceed the numerical accurracy of digital computers. If you launch metaMDS() with argument trace=2 you will get more verbose output that reports the stopping criterion used. If it is always the same criterion, you could make that one stricter (but stay within the numbers digital computers can handle). My first guess is that with data of this dimensions, you very easily stop because you exceed the maximum number of iterations. If that happens, you actually stop before reaching a solution, and you should increase maxit. Only after checking these things you should proceed with looking at the peculiarities of your data (disjunct or nearly disjunct subsets etc.)

cheers, Jari Oksanen

> I've tried using the Jaccard and Bray methods (I don't think this should
> have an effect)
> 
> I've trialled 3-6 dimensions randomly (this has in the past affected the
> result, but that might be because of other factors)
> 
> I have always used the noshare = TRUE option otherwise it ejects some of
> the sampling points with rare species to astronomical values on one or more
> axes
> 
> I've tried iterations of this about 20-30 times but it still won't ever
> give me a best solution that isn't the first run. Here is the basic code:
> 
> metaMDS(PSU.sp, k= x, distance = "jaccard", sfgrmin = x, sratmax = x,
> noshare = TRUE)
> 
> I'm happy to privately share the raw dataset with Jari Oksanen if he's
> interested in this phenomenon, but I would have to seek permission for
> anyone else since I do not own it. In the meantime I will investigate other
> methods to analyse this data, which shouldn't be an issue. Since my dataset
> is unusually large for this method, this is probably more for curiosity's
> sake for the Vegan developers.
> 
> Thanks for your help,
> 
> Ewan Isherwood
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



More information about the R-sig-ecology mailing list