[R-sig-eco] pca or nmds (with which normalization and distance ) for abundance data ?

Fri Dec 14 10:24:23 CET 2012

Hello Folks,

Kruskal's "rule of thumb" really is a rule of thumb. That is, it is intended for a rough guideline. In that sense, there is no difference to Clarke's rules. However, I wouldn't judge usability simply by stress: solutions with very low stress can be useless and solutions with fairly high stress can be usable. In stress it is a question about many things, but a large portion of stress is similar as signal/noise ratios. The signal is more difficult to detect with high noise, but if you detect the signal, the amount of noise does not matter. I have quite often seen pretty usable solutions with stress around or above 0.20 (20%), at least when using external explanatory variables. There are limits, though. If you trace single runs, you may see that random starting configurations start typically start with stress 0.4 (40%) or a bit higher. If you cannot improve from that, the solution probably is pretty useless (and metaMDS you will probably have no convergent solutions). However, instead of discarding the results, you may first start with stricter convergence criteria for monoMDS (if you use monoMDS). See its help pages (next version of vegan will have stricter limit for "scale factor of gradient", sfgrmin). 

There is also a limit for low stress. In fact, the current vegan warns of too low stress (Kruskal's "perfect" fit). This is usually a symptom of insufficient data (too many dimensions for too few points, dissimilarities found from too few variables).

In my opinion, ecologists are often too much obsessed with goodness of fit values. This is true in general, but also very manifest with multivariate method. I do think that if you, say, in PCA or RDA "explain" something like >50%, there is something suspect in your analysis. Typical reasons are insufficient data (too few rows or columns) or not really multivariate data. Sometimes there are some very dominant species (high variance) so that the analysis need not care but about a couple of species, and that is an easy task. If you transform your data so that high abundances are squashed down and variances equalized, or even made equal, the data become more multivariate (= all species count). Typically this means that lower proportion of variance is "explained", but often the results are more interpretable. This also happens when you change models: Unscaled PCA/RDA using variances "explains" much of the variance, scaled PCA/RDA using correlations "explains" much less, and CA/CCA studying deviations from expectations "explains" the least. Typically the usability and interpretability of the results improves as "explanatory power" decreases. The same also often holds for NMDS: Euclidean distances often give lower stress and pooorer results athn dissimilarities that treat all species more equally.

Not really R, but perhaps I'm forgiven (this time),

Cheers, Jari Oksanen  
________________________________________
From: r-sig-ecology-bounces at r-project.org [r-sig-ecology-bounces at r-project.org] on behalf of Alan Haynes [aghaynes at gmail.com]
Sent: 14 December 2012 09:53
To: sas0025 at auburn.edu
Cc: claire della vedova; r-sig-ecology at r-project.org
Subject: Re: [R-sig-eco] pca or nmds (with which normalization and distance ) for abundance data ?

Hi Claire,

Im not sure if it helps, but it might be interesting to hear other list
readers views on the subject, but McCune and Grace, the authors of PCOrd
and "Analysis of Ecological Communities" have a couple of rules of thumb
for NMDS stress.
They use Kruskal stress*100, while i believe monoMDS (and thus metaMDS)
uses simple Kruskal stress. (values in brackets below are thus the values
vegan could report)

"Kruskal's rules of thumb"
2.5 (or 0.025) = excellent
5 (0.05) = good
10 (0.1) = fair
20 (0.2) = poor

"Clarke's rules of thumb"
<5 (0.05) - excellent, cannot be misinterpreted, but incredibly rare in
practice
5-10 (0.05 - 0.1) - good no real risk of false inference
10-20 (0.1 - 0.2) - can be usable, but upper values could be misleading.
plot details should not be used
>20 (0.2) - plots likely to be dangerous to interpret. Stresses of >~35,
samples are more or less randomly placed with little regard for ranking.

Correspondingly, McCune and Grace would probably err on the side of caution
as 0.22 is getting into the poor fit, dangerous to interpret areas.

It would be interesting to hear other NMDS users views on this...what
stress do you consider too high, when does an ordination become
(essentially) useless etc.

HTH

Cheers,

Alan

--------------------------------------------------
Email: aghaynes at gmail.com
Mobile: +41794385586
Skype: aghaynes

On 13 December 2012 21:03, Stephen Sefick <sas0025 at auburn.edu> wrote:

>
>
> On Thu 13 Dec 2012 09:24:41 AM CST, claire della vedova wrote:
>
>>
>> Dear all,
>>
>> I’m a biostatistician working for a French institute involved in
>> environmental risk assessment, and I would need help to understand the
>> results I obtained from several ordination analyses.
>>
>> I have a dataset of 25 sites. For these 25 sites I have abundance data of
>> 38
>> species and also the measurement of 5 environmental variables.
>>
>> Here an extract of my abundance data for the 5 first sites:
>>
>> Anguinidae.ditylenchus Aphelenchidae Aphelenchoididae Aporcelaimidae
>>
>> 12 18 184 0
>>
>> 0 14 154 0
>>
>> 45 0 101 6
>>
>> 20 0 148 0
>>
>> 0 0 118 0
>>
>>
>>
>> Here the environmental data for the 5 first sites:
>>
>> ExtPond moist Corg pH DV50
>>
>> 0.946 9.086 4.269 5.24 171.33
>>
>> 0.682 27.139 23.813 3.82 75.45
>>
>> 2.480 14.322 7.191 4.48 230.90
>>
>> 3.069 18.380 11.404 3.58 211.19
>>
>> 2.615 16.693 7.128 4.12 224.45
>>
>>
>>
>> My aim was to study how the distribution of species is linked with
>> environmental data.
>>
>> Firstly, I did a PCA (with vegan library), using a Hellinger
>> transformation,
>> with commands like this :
>>
>> acp1<-rda(decostand(**myDataSpec[,c(25:62)], "hellinger"))
>>
>>
>>
> Is the Hellinger transform done on relative proportions?
>
>
>
>
>
>
>
>
>
>
>
>> The first axe represent 19.5% the second one 16.3%. A colleague of me said
>> it is not so bad with abundance data, but it seems to me quite poor. What
>> do
>> you think about ?
>>
>>
>>
> You could use something like the broken stick model or others to access
> how many axes are necessary, but two axes explaining <40% of the variation
> seems low.
>
>
>
>> Then, I fitted environmental vectors with the envfit function (of vegan
>> library), with commands like this :
>>
>> physCInd.fit3<-envfit(acp1,**MyDataEnv[,c(13,18,20,21,23)], permut=4999,
>> na.rm=T)
>>
>> It appeared that pH variable is significantly linked with the ordination,
>> and the pval of ExtPond is 0.1.
>>
>> Next I did a RDA which is not significant.
>>
>> To finish I did two NMDS. For the first one I used the Hellinger
>> normalization and the Bray-Curtis distance. The stress obtained value is
>> 0.22, Non metric fit R² is 0.952 and Linear fit R2 =0.777. When I fitted
>> the
>> environmental vectors , ExtPond was correlated with the ordination (pval
>> =0.02) and p-val of pH = 0.23
>>
>> But then I read in “numerical ecology” page 449 that it’s better to
>> standardize the data by dividing each value by maximum abundance for
>> species
>> and then use Kulcynski distance. The stress value was 0.23 , Non metric
>> fit
>> R² was 0.948 and Linear fit R2 =0.69. These values are a little less good
>> than those of the first NMDS, but the stressplot seems to me more
>> homogenous.
>>
>> Nevertheless, the results I obtained are very different... When I fitted
>> the environmental data it appeared that ExtPond was not correlated with
>> this
>> ordination (p-val=0.82) and p-val of pH=0.06. And obviously ExtPond is the
>> most important variable for us ;-)
>>
>> With all these results, I’m quite confused, and I don’t know what to
>> think.
>> So, if someone can help me, I would appreciate it very much. Be sure that
>> all comments will be welcome.
>>
>> To summarize my questions are :
>>
>> a) Which ordination method would be better for my data : PCA knowing
>> that the represented inertia is 35.62% or NMDS with a stress value about
>> 0.22?
>>
>>  My opinion is PCA on hellinger transformed relative proportions "means"
> more than an NMDS
>
>
>  b) If NMDS is more adapted which one is the better? with Hellinger
>> normalization and Bray-Curtis distance, or with the normalization
>> recommended by Legendre and Legendre and Kulcynski distance ?
>>
>>  I sounds like the normalization you are referring to is relative
> proportion which is si/sum(s); s is a vector of taxon at a site.
>
>
>  c) Is there other method to apply? I’m going to try co-inertia with
>> ade4 package
>>
>>
>>
>>  I am reading about co-inertia analysis now as it may be useful for some
> of the things that I am planning on doing.  This method looks promising.
>
> You are going to have to decide on what type of ordination to use with
> COIA...
>
> HTH,
>
> Stephen
>
>  Thanks in advance.
>>
>> Cheers.
>>
>> Claire Della Vedova
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>>
>>
>> ______________________________**_________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/**listinfo/r-sig-ecology<https://stat.ethz.ch/mailman/listinfo/r-sig-ecology>
>> --
>> Stephen Sefick
>> ****************************************************
>> Auburn University
>> Biological Sciences
>> 331 Funchess Hall
>> Auburn, Alabama
>> 36849
>> ****************************************************
>> sas0025 at auburn.edu
>> http://www.auburn.edu/~sas0025
>> ****************************************************
>>
>> Let's not spend our time and resources thinking about things that are so
>> little or so large that all they really do for us is puff us up and make us
>> feel like gods.  We are mammals, and have not exhausted the annoying little
>> problems of being mammals.
>>
>>                                  -K. Mullis
>>
>> "A big computer, a complex algorithm and a long time does not equal
>> science."
>>
>>                                -Robert Gentleman
>>
>>
> ______________________________**_________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/**listinfo/r-sig-ecology<https://stat.ethz.ch/mailman/listinfo/r-sig-ecology>
>

        [[alternative HTML version deleted]]