[R-sig-eco] Fwd: how to calculate "axis variance" in metaMDS, pakage vegan?

Mon Dec 7 11:30:56 CET 2009

On Sun, 2009-12-06 at 17:56 +0100, Gian Maria Niccolò Benucci wrote:
> Hi all again,
> 
> And thank you all for the kind responses...
> 
> I do not want to torture myself for sure... :) I red (lot of) publications
> about fungal community ecology studies (soil fungi), my research field
> indeed, and all uses NMDS or DCA as ordination techniques... So, I am only
> trying to do my best useing R for calculating them...

Would you walk in front of a bus if you saw lots of other people doing
it? I doubt it. This kind of "me to" attitude to science is quite
demoralising when reviewing manuscripts and reading the literature.

DCA was invented to solve a specific problem with CA - namely the arch
artefact. I forget whether this is in Jari's public lecture notes, vegan
vignettes/tutorials or in one of his lectures on a course we taught
together, but DCA replaces the arch artefact with other artefacts that
make the points look like a trumpet or a diamond in ordination space.

Why DCA is used as a default instead of a special case escapes me. You
really shouldn't use DCA at all if you can get away with it as it is
doing some nasty things to your data.

Alternatives; i) NMDS ii) PCA after application of a transformation
(Legendre & Gallagher 2001, Oecologia). And there are probably others...

> 
> What I need now is a good environmental interpretation of my work...
> 
> Then I found the fantastic Jari's pdf about "Multivariate Analysis of
> Ecological Communities in R: vegan tutorial" and I went to the passage about
> factors and vectors fitting...
> That's my R code:
> 
> > NMS.ABCDsqrt
> 
> Call:
> metaMDS(comm = sqrtABCD, distance = "bray", k = 4, trymax = 100,
> autotransform = F)
> 
> Nonmetric Multidimensional Scaling using isoMDS (MASS package)
> 
> Data:     sqrtABCD
> Distance: bray shortest
> 
> Dimensions: 4
> Stress:     11.68632

What was the stress with k = 2 and k = 3. As Jari has already mentioned,
how are you going to interpret and visualise this 4D configuration of
points (you can't plot NMDS1 vs NMDS2, NMDS1 vs NMDS3 etc. for reasons
explained to you earlier in this thread).

> Two convergent solutions found after 2 tries
> Scaling: centring, PC rotation, halfchange scaling
> Species: expanded scores based on sqrtABCD
> 
> > envfit(NMS.ABCDsqrt, env.table, permu=1000) ->NMS.ABCDsqrtef
> > NMS.ABCDsqrtef
> 
> ***FACTORS:
> 
> Centroids:
>               NMDS1   NMDS2   NMDS3   NMDS4
> CommunityA  -0.3821  0.3822 -0.1173 -0.1232
> CommunityB  -0.1849  0.2748  0.0076 -0.0720
> CommunityC   0.2206 -0.4261 -0.0505  0.1197
> CommunityD   0.3465 -0.2310  0.1603  0.0756
> HostCorylus -0.2835  0.3285 -0.0549 -0.0976
> HostOstrya   0.2835 -0.3285  0.0549  0.0976
> 
> Goodness of fit:
>               r2   Pr(>r)
> Community 0.2009 0.001998 **
> Host      0.1818 0.000999 ***
> ---
> Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
> P values based on 1000 permutations.
> 
> > names(NMS.ABCDsqrtef)
> [1] "vectors" "factors"
> > NMS.ABCDsqrtef$vectors
> NULL
> >
> 
> I have an enviromental matrix (env.table) that contains only area (A,B,C,D)
> and tree species (Corylus sp. and Ostrya sp. ) differentiation, so I have
> the area and the tree species of each samples is related to...
> I could plot factors on the graph but not vectors because there aren't
> vectors in reason to the absence of numerical data in the env. matrix...
> isn't it right? Aren't the R2 values too low?

Did you read ?envfit ? It states:
....
     (r^2). For factors this is defined as r^2 = 1 - ss_w/ss_t, where
     ss_w and ss_t are within-group and total sums of squares.

So this statistic here is looking at how constrained within the 4D space
the levels of each factor are in relation to the overall spread of the
points. This looks to me like some evidence for grouping of your sites
on basis of Community and stronger evidence for Host. The "effect" is
small but significant. I do wonder if you are not hitting the curse of
dimensionality here?

The interpretation will depend on the number of samples. It would be
nice to look at the ordination but how you do that I don't know.

G

> 
> Many many thank you for answering...
> 
> Gian
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> > Jari,
> > >
> > > I am here again ... :)
> > > So, to try having a comparison of the real goodness of my metaMDS data I
> > > tried to perform a DCA (with same input table)
> > > Then please forgive me if I do somethign wrong with it... That's my R
> > code:
> >
> > Why DCA? What lead you to torture your data so?
> >
> > > >decorana(sqrtABCD, iweigh=0, ira=0) -> DCA.1
> > > > DCA.1
> > >
> > > Call:
> > > decorana(veg = sqrtABCD, iweigh = 0, ira = 0)
> > >
> > > Detrended correspondence analysis with 26 segments.
> > > Rescaling of axes with 4 iterations.
> > >
> > >                   DCA1   DCA2   DCA3   DCA4
> > > Eigenvalues     0.6688 0.5387 0.4822 0.3752
> > > Decorana values 0.7912 0.5795 0.4145 0.2931
> > > Axis lengths    5.9974 3.7036 3.6121 3.3802
> > >
> > > >
> > >
> > > In that situation the graph is still good but the differences between the
> > > two clades are little more confused, maybe in the axe II (I mean the
> > > vertical one) in this case there is a better separation.
> > > What do the "Decorana values" really mean?
> >
> > ?decorana
> >
> > Basically, in the original DECORANA code the Eigenvalues reported were
> > computed at the wrong stage of the "detrending" processes. Jari realised
> > this when interfacing the old DECORANA code with R. Jari altered the
> > code to compute the correct Eigenvalues, but chose to also report the
> > values you'd get from DECORANA or Canoco to stop people complaining that
> > vegan was doing DCA incorrectly.
> >
> > >  And how about the segments?
> >
> > What about them? Do you know how DCA works? The standard detrending
> > breaks the first (D)CA axis into 26 sequential chunks or segements. the
> > 26 is the default, but it can be changed. Within each chunk, the mean
> > trial site score for axis 2 for sites in that chunk is subtracted from
> > the trial axis 2 site scores of the sites in the chunk. This detrending
> > is what gets rid of the "arch" found in some CA plots and is the reason
> > DCA was invented.
> >
> > >
> > > How can I do something better?
> >
> > Are you trying to separate the two clades? Do you know a priori which
> > samples belong to which clade? If so, one of the many classification
> > methods in R would be more useful as they look to separate the a priori
> > defined groups best. The methods you have been using thus far aim to
> > represent the dissimilarities between samples best in a low dimensional
> > space.
> >
> > HTH
> >
> > G
> >
> > >
> > > Many thank you in advance,
> > >
> > > G.
> >
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%