[R-sig-eco] Fwd: how to calculate "axis variance" in metaMDS, pakage vegan?

Gavin Simpson gavin.simpson at ucl.ac.uk
Tue Dec 8 15:51:45 CET 2009


On Mon, 2009-12-07 at 20:10 +0100, Gian Maria Niccolò Benucci wrote:
> Hi Gavin and Hi all,
> 
> I will not go in front of a bus for sure, I not mad, at least I am not still
> mad... :)
> 
> I would like to tell you that I am a Ph.D. student, and for what I know,
> Ph.D. student still have to understand things studing those from whom wrote
> before them...

That's fine, but temper that with a realisation that not everyone knows
what they are doing numerically. So be critical about what you read,
learn about the methods and what they do.

<snip />

> Anyway... For continuing the brainstorm if I can...The Host effect is what I
> think is more interesting for the ecological point of view of my trials also
> becasue the 4 communities have two by two the same host, I mean A and B,
> Corylus, while B and C, Ostrya...
> If I plot the factors of the envifit into the graph and the evidence of
> separation seems clear...
> 
> That's are my metaMDS with 2 and 3 dimensions:

Thanks for these: one way of trying to choose a dimensionality for the
"solution" is to plot the stress as a function of k (k on the x-axis,
stress on the y) - this is often called a screeplot as you are looking
for a dramatic change in slope. I took your stresses and plotted them
against k (crudely):

plot(2:4, c(24.54342, 16.29226, 11.68632), type = "b")

and doesn't seem to be any noticeable change here, so not much help
there.

Looking at the goodness of fit stats, the story they tell doesn't really
change much depending on whether you use 2,3, or 4 dimensions. So
perhaps stick with 2 in that case.

Also, try:

stressplot(MOD)

where mod is the object returned by metaMDS. The stressplot plots your
original dissimilarities against dissimilarities derived from the nMDS
configuration. It also shows the monotonic regression fit and a few
goodness of fit criteria. You could evaluate the models with different k
using these plots.

> 
> > NMS.1
> 
> Call:
> metaMDS(comm = sqrtABCD, distance = "bray", k = 2, trymax = 100,
> autotransform = F)
<snip />
> I got 10 sample units for each community data (40 in total)
> 
> You said at the end : "*I do wonder if you are not hitting the curse of
> dimensionality here?"*
> 
> Can you explain me what do you mean for "hitting the curse of
> dimensionality" if I am not so demanding...

Yep, sorry, that was a bit cryptic. Curse of dimensionality is a phrase
coined by Belman (1961) and refers to the problem of defining
"localness" in high dimensions; neighbourhoods with a fixed number of
samples become less local as the number of dimensions in creases.
basically, if you have a number of dimensions, the more dimensions you
have the easier it is for a sample to lie a long way from the rest of
the data along a single dimension and thus have large dissimilarity.

This doesn't appear to be the case here though; 4 is low dimensionality
(hence my wondering if this was or wasn't a problem), but when you'd
only shown the k=4 data, I did wonder if the low r2 was due to you
points being widely spread along one of the 4D; i.e. was the more
complex solution leading to the low r2?

By looks of things, the low r2 is probably more to do with the small,
but significant, "effects" of your two covariates.

> 
> ... and then: *"it would be nice to look at the ordination but how you do
> that I don't know".*
> 
> I would be glad if you see the graphs of my ordinations, Can I send them to
> you? That would be great... let me know about that. I used to plot in this
> way:
> 
> > plot(NMS.1, type="n", dis= "sp")
> > ordisymbol(NMS.1, env.table, "Host", legend=T)
> 
> Anyway I have to admit that with 2 and at least 3 dimensions the points into
> the ordinantion plot are better separated in reasons to the data matrix, so
> what to do? better fittind of points ant bigger stress or the contrary?

If this were me, seeing as the interpretation/results don't change, I'd
probably stick with k=2 so you can easily draw the ordination for
presentation in your phd work or future papers.

HTH

G

> 
> I think is enough, thank you so much for your help, I'll appreciate any
> comments! :)
> 
> Gian
> 
> 
> 
> 
> 
> 
> > And thank you all for the kind responses...
> > >
> > > I do not want to torture myself for sure... :) I red (lot of)
> > publications
> > > about fungal community ecology studies (soil fungi), my research field
> > > indeed, and all uses NMDS or DCA as ordination techniques... So, I am
> > only
> > > trying to do my best useing R for calculating them...
> >
> > Would you walk in front of a bus if you saw lots of other people doing
> > it? I doubt it. This kind of "me to" attitude to science is quite
> > demoralising when reviewing manuscripts and reading the literature.
> >
> > DCA was invented to solve a specific problem with CA - namely the arch
> > artefact. I forget whether this is in Jari's public lecture notes, vegan
> > vignettes/tutorials or in one of his lectures on a course we taught
> > together, but DCA replaces the arch artefact with other artefacts that
> > make the points look like a trumpet or a diamond in ordination space.
> >
> > Why DCA is used as a default instead of a special case escapes me. You
> > really shouldn't use DCA at all if you can get away with it as it is
> > doing some nasty things to your data.
> >
> > Alternatives; i) NMDS ii) PCA after application of a transformation
> > (Legendre & Gallagher 2001, Oecologia). And there are probably others...
> >
> > >
> > > What I need now is a good environmental interpretation of my work...
> > >
> > > Then I found the fantastic Jari's pdf about "Multivariate Analysis of
> > > Ecological Communities in R: vegan tutorial" and I went to the passage
> > about
> > > factors and vectors fitting...
> > > That's my R code:
> > >
> > > > NMS.ABCDsqrt
> > >
> > > Call:
> > > metaMDS(comm = sqrtABCD, distance = "bray", k = 4, trymax = 100,
> > > autotransform = F)
> > >
> > > Nonmetric Multidimensional Scaling using isoMDS (MASS package)
> > >
> > > Data:     sqrtABCD
> > > Distance: bray shortest
> > >
> > > Dimensions: 4
> > > Stress:     11.68632
> >
> > What was the stress with k = 2 and k = 3. As Jari has already mentioned,
> > how are you going to interpret and visualise this 4D configuration of
> > points (you can't plot NMDS1 vs NMDS2, NMDS1 vs NMDS3 etc. for reasons
> > explained to you earlier in this thread).
> >
> > > Two convergent solutions found after 2 tries
> > > Scaling: centring, PC rotation, halfchange scaling
> > > Species: expanded scores based on sqrtABCD
> > >
> > > > envfit(NMS.ABCDsqrt, env.table, permu=1000) ->NMS.ABCDsqrtef
> > > > NMS.ABCDsqrtef
> > >
> > > ***FACTORS:
> > >
> > > Centroids:
> > >               NMDS1   NMDS2   NMDS3   NMDS4
> > > CommunityA  -0.3821  0.3822 -0.1173 -0.1232
> > > CommunityB  -0.1849  0.2748  0.0076 -0.0720
> > > CommunityC   0.2206 -0.4261 -0.0505  0.1197
> > > CommunityD   0.3465 -0.2310  0.1603  0.0756
> > > HostCorylus -0.2835  0.3285 -0.0549 -0.0976
> > > HostOstrya   0.2835 -0.3285  0.0549  0.0976
> > >
> > > Goodness of fit:
> > >               r2   Pr(>r)
> > > Community 0.2009 0.001998 **
> > > Host      0.1818 0.000999 ***
> > > ---
> > > Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
> > > P values based on 1000 permutations.
> > >
> > > > names(NMS.ABCDsqrtef)
> > > [1] "vectors" "factors"
> > > > NMS.ABCDsqrtef$vectors
> > > NULL
> > > >
> > >
> > > I have an enviromental matrix (env.table) that contains only area
> > (A,B,C,D)
> > > and tree species (Corylus sp. and Ostrya sp. ) differentiation, so I have
> > > the area and the tree species of each samples is related to...
> > > I could plot factors on the graph but not vectors because there aren't
> > > vectors in reason to the absence of numerical data in the env. matrix...
> > > isn't it right? Aren't the R2 values too low?
> >
> > Did you read ?envfit ? It states:
> > ....
> >     (r^2). For factors this is defined as r^2 = 1 - ss_w/ss_t, where
> >     ss_w and ss_t are within-group and total sums of squares.
> >
> > So this statistic here is looking at how constrained within the 4D space
> > the levels of each factor are in relation to the overall spread of the
> > points. This looks to me like some evidence for grouping of your sites
> > on basis of Community and stronger evidence for Host. The "effect" is
> > small but significant. I do wonder if you are not hitting the curse of
> > dimensionality here?
> >
> > The interpretation will depend on the number of samples. It would be
> > nice to look at the ordination but how you do that I don't know.
> >
> > G
> >
> > >
> > > Many many thank you for answering...
> > >
> > > Gian
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > > Jari,
> > > > >
> > > > > I am here again ... :)
> > > > > So, to try having a comparison of the real goodness of my metaMDS
> > data I
> > > > > tried to perform a DCA (with same input table)
> > > > > Then please forgive me if I do somethign wrong with it... That's my R
> > > > code:
> > > >
> > > > Why DCA? What lead you to torture your data so?
> > > >
> > > > > >decorana(sqrtABCD, iweigh=0, ira=0) -> DCA.1
> > > > > > DCA.1
> > > > >
> > > > > Call:
> > > > > decorana(veg = sqrtABCD, iweigh = 0, ira = 0)
> > > > >
> > > > > Detrended correspondence analysis with 26 segments.
> > > > > Rescaling of axes with 4 iterations.
> > > > >
> > > > >                   DCA1   DCA2   DCA3   DCA4
> > > > > Eigenvalues     0.6688 0.5387 0.4822 0.3752
> > > > > Decorana values 0.7912 0.5795 0.4145 0.2931
> > > > > Axis lengths    5.9974 3.7036 3.6121 3.3802
> > > > >
> > > > > >
> > > > >
> > > > > In that situation the graph is still good but the differences between
> > the
> > > > > two clades are little more confused, maybe in the axe II (I mean the
> > > > > vertical one) in this case there is a better separation.
> > > > > What do the "Decorana values" really mean?
> > > >
> > > > ?decorana
> > > >
> > > > Basically, in the original DECORANA code the Eigenvalues reported were
> > > > computed at the wrong stage of the "detrending" processes. Jari
> > realised
> > > > this when interfacing the old DECORANA code with R. Jari altered the
> > > > code to compute the correct Eigenvalues, but chose to also report the
> > > > values you'd get from DECORANA or Canoco to stop people complaining
> > that
> > > > vegan was doing DCA incorrectly.
> > > >
> > > > >  And how about the segments?
> > > >
> > > > What about them? Do you know how DCA works? The standard detrending
> > > > breaks the first (D)CA axis into 26 sequential chunks or segements. the
> > > > 26 is the default, but it can be changed. Within each chunk, the mean
> > > > trial site score for axis 2 for sites in that chunk is subtracted from
> > > > the trial axis 2 site scores of the sites in the chunk. This detrending
> > > > is what gets rid of the "arch" found in some CA plots and is the reason
> > > > DCA was invented.
> > > >
> > > > >
> > > > > How can I do something better?
> > > >
> > > > Are you trying to separate the two clades? Do you know a priori which
> > > > samples belong to which clade? If so, one of the many classification
> > > > methods in R would be more useful as they look to separate the a priori
> > > > defined groups best. The methods you have been using thus far aim to
> > > > represent the dissimilarities between samples best in a low dimensional
> > > > space.
> > > >
> > > > HTH
> > > >
> > > > G
> > > >
> > > > >
> > > > > Many thank you in advance,
> > > > >
> > > > > G.
> > > >
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > R-sig-ecology mailing list
> > > R-sig-ecology at r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> > --
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> >  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
> >  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
> >  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
> >  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/<http://www.ucl.ac.uk/%7Eucfagls/>
> >  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> >
> >
> >
> 
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-sig-ecology mailing list