[R] Results from Vegan metaMDS varry depending on set.seed
Jari Oksanen
jari.oksanen at oulu.fi
Fri Dec 27 13:22:30 CET 2013
Dear Vinny Moriarty,
Vinny Moriarty <vwmoriarty <at> gmail.com> writes:
>
> I've got an ecological data set that I've worked up to the point of having
> a relative abundance matrix I created with the decostand() command in Vegan.
>
> Here is the distance matrix data:
<-- clip: 8 x 4 data matrix giving distance object of 28 elements -->
>
> At first I was having issues with metaMDS producing two distinctly
> different NMDS plots at seemingly random intervals as I re-ran my analysis
> over multiple runs. I figured out it was because I was not using
> set.seedfor my metaMDS call. But now I am concerned that the seemingly
> small change
> of setting set.seed() has such a large impact on my analysis.
>
There are several issues here:
1) small changes in random seed should give completely different sequences.
In fact, they should give as different sequences as with large changes in
random seed.
> As can be seen in the below oridplots, it looks to me like there is a
> change in relative distances between some of the latter 'sites'.
>
> set.seed(1)
> mds10<- metaMDS(DF, dist='bray')
>
> ordiplot(mds10,display='sites',type='text')
>
> vs.
>
> set.seed(999)
> mds10<- metaMDS(DF, dist='bray')
>
2) set.seed(1) indeed seems to get trapped in the local minimum (which is
equal to the initial solution based on PCoA).
> ordiplot(mds10,display='sites',type='text')
>
> The difference between the two plots is large enough that it would change
> my interpretation of my analysis, so as this is my first foray into NMDS I
> am a bit concerned. Can someone tell me if this is just part of how NMDS or
> Vegan works (different local minimums)? Or does this imply a certain
> ambiguity about my data set? Or am I completely misreading the plots.
>
3) You should not make too firm conclusions based on data of 8 points. You
need more data. I would not perform NMDS with 8 data points although it is
technically possible.
> If I add vector arrows for the 'species' influence like so:
>
> envfit10<-envfit(mds10, DF,perm=999)
> plot(mds10, display='sites',type='t')
> plot(envfit10)
>
> I can see that the two plots have different 'species' vectors, but it looks
> like the relative distance between S5 and some of the other sites changes
> between the two plots.
>
> Is one ordiplot more 'correct' than the other? If not, what am I to make of
> the difference between plots?
4) NMDS and metaMDS() return you a goodness of fit statistic called stress.
Low stress is good (also in Christmas time). One of the solutions has lower
stress, and in that sense it is more correct. It also seems that you do not
get any random configurations, but all analyses end up in two alternative
state. With set.seed(1) you get one state with higher stress, and with
set.seed(999) you get another state with lower stress. This would indicate
that set.seed(1) gives you a local minimum, and set.seed(999) possibly a
global minimum. The best way of comparing solutions is to use
procrustes() function of vegan which shows you that solutions go to
either of these groups. However, with 8 points you should not push your
analysis too far away to firm conclusions.
Cheers, Jari Oksanen
More information about the R-help
mailing list