[R-sig-eco] Fwd: how to calculate "axis variance" in metaMDS, pakage vegan?

gabriel singer gabriel.singer at univie.ac.at
Tue Dec 8 16:28:56 CET 2009


Hi Gian and others,

I think we better stop worrying about subjective interpretations of 
emotional backgrounds of what in other aspects are absolutely helpful 
discussion threads... I guess part of the challenge on this mailing list 
is to span the whole range of expertise with useful 
discussion/output/help for everyone, be it a student or an expert. I 
found this mailing list very helpful many times for my own questions, 
but also very informative when just following the threads on other 
questions...

Gian, in my opinion, 2 dimensions are absolutely ok, especially if they 
do visualize an (obvious) effect in your study. In other words, if 2 
dimensions show you an effect of "Host" but not of "Area", the effect is 
obviously strong enough. Then I would not worry about stress too much. 
However, there may still be an effect of "Area", maybe visible in more 
dimensions, but it´s obviously of minor importance.

I personally like a combination of NMDS with the permutational MANOVA 
approach (by Marti Anderson) implemented in the function adonis() in 
vegan. You can use the same dissimilarity measure (Bray-Curtis) used for 
the NMDS and can test the "Area" vs. the "Host" effect on parasite (was 
it?) composition. I think that could be a very useful complement to an 
NMDS-derived ordination plot and then you may also regard high-stress 
"representations" (and that´s what all the low-dimensional ordination 
plots really ARE!) in a different light.

Complementations like the permanova are in my opinion better than trying 
the full spectrum of ordination methods until finally some kind of 
pattern gets uncovered (comes quite close to the much too often 
encountered data-fishing expeditions....). And though copying analysis 
strategies is probably not quite like throwing yourself in front of a 
bus, there is some benefit in using what people working in a specific 
field regard their "standard" methods (wait for the reviews to discover 
this). In any case, a responsible choice for a type of analysis is 
oriented along the study design and the data at hand.

cheers, gabriel

-- 
Dr. Gabriel Singer
Department of Freshwater Ecology - University of Vienna
and Wassercluster Lunz Biologische Station GmbH
+43-(0)664-1266747
gabriel.singer at univie.ac.at



Gian Maria Niccolò Benucci wrote:
> Hi Gavin and Hi all,
>
> I will not go in front of a bus for sure, I not mad, at least I am not still
> mad... :)
>
> I would like to tell you that I am a Ph.D. student, and for what I know,
> Ph.D. student still have to understand things studing those from whom wrote
> before them...
>
> Isac Newton became famous not only for his science but also for a famous
> phrase that, if I don't remember it bad, act like this :" If I have seen so
> much far away is because I stand on shoulders of Giants"... I think that it
> needs any comment, and express itself the concept...
>
> So, I am so sorry, I also don't like the "me to" attitude, but you don't
> know how is my reality here, and I can assure you that also If I am still a
> "student", I am alone in my research, and If have a tutor and boss for
> italian rules I don't have a boss for statistics, couse none could help me
> on that...
> So what could I do if I don't take models in already published literature?
>
> Anyway, I don't want to seem like the victim, I have a brain that works and
> I am doing my best to understand and improve my knowledge and at least lean
> and grow, for sure, step by step, and with a big humility, in science and in
> this case in statistics...
>
> Anyway... For continuing the brainstorm if I can...The Host effect is what I
> think is more interesting for the ecological point of view of my trials also
> becasue the 4 communities have two by two the same host, I mean A and B,
> Corylus, while B and C, Ostrya...
> If I plot the factors of the envifit into the graph and the evidence of
> separation seems clear...
>
> That's are my metaMDS with 2 and 3 dimensions:
>
>   
>> NMS.1
>>     
>
> Call:
> metaMDS(comm = sqrtABCD, distance = "bray", k = 2, trymax = 100,
> autotransform = F)
>
> Nonmetric Multidimensional Scaling using isoMDS (MASS package)
>
> Data:     sqrtABCD
> Distance: bray shortest
>
> Dimensions: 2
> Stress:     24.54342
> Two convergent solutions found after 18 tries
> Scaling: centring, PC rotation, halfchange scaling
> Species: expanded scores based on ‘sqrtABCD’
>
>   
>> NMS.ABCD.2ef
>>     
>
> ***FACTORS:
>
> Centroids:
>               NMDS1   NMDS2
> CommunityA  -0.3271  0.1984
> CommunityB  -0.1956  0.1768
> CommunityC   0.2520 -0.2847
> CommunityD   0.2706 -0.0905
> HostCorylus -0.2613  0.1876
> HostOstrya   0.2613 -0.1876
>
> Goodness of fit:
>               r2   Pr(>r)
> Community 0.1897 0.017982 *
> Host      0.1778 0.001998 **
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> P values based on 1000 permutations.
>   
>> NMS.1.3
>>     
>
> Call:
> metaMDS(comm = sqrtABCD, distance = "bray", k = 3, trymax = 100,
> autotransform = F)
>
> Nonmetric Multidimensional Scaling using isoMDS (MASS package)
>
> Data:     sqrtABCD
> Distance: bray shortest
>
> Dimensions: 3
> Stress:     16.29226
> Two convergent solutions found after 6 tries
> Scaling: centring, PC rotation, halfchange scaling
> Species: expanded scores based on ‘sqrtABCD’
>
>   
>> NMS.ABCD.3ef
>>     
>
> ***FACTORS:
>
> Centroids:
>               NMDS1   NMDS2   NMDS3
> CommunityA   0.3881 -0.2702  0.1536
> CommunityB   0.1407 -0.2344  0.0197
> CommunityC  -0.2053  0.3566 -0.0219
> CommunityD  -0.3235  0.1480 -0.1514
> HostCorylus  0.2644 -0.2523  0.0866
> HostOstrya  -0.2644  0.2523 -0.0866
>
> Goodness of fit:
>               r2   Pr(>r)
> Community 0.1798 0.005994 **
> Host      0.1581 0.000999 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> P values based on 1000 permutations.
>   
>
> I got 10 sample units for each community data (40 in total)
>
> You said at the end : "*I do wonder if you are not hitting the curse of
> dimensionality here?"*
>
> Can you explain me what do you mean for "hitting the curse of
> dimensionality" if I am not so demanding...
>
> ... and then: *"it would be nice to look at the ordination but how you do
> that I don't know".*
>
> I would be glad if you see the graphs of my ordinations, Can I send them to
> you? That would be great... let me know about that. I used to plot in this
> way:
>
>   
>> plot(NMS.1, type="n", dis= "sp")
>> ordisymbol(NMS.1, env.table, "Host", legend=T)
>>     
>
> Anyway I have to admit that with 2 and at least 3 dimensions the points into
> the ordinantion plot are better separated in reasons to the data matrix, so
> what to do? better fittind of points ant bigger stress or the contrary?
>
> I think is enough, thank you so much for your help, I'll appreciate any
> comments! :)
>
> Gian
>
>
>
>
>
>
>   
>> And thank you all for the kind responses...
>>     
>>> I do not want to torture myself for sure... :) I red (lot of)
>>>       
>> publications
>>     
>>> about fungal community ecology studies (soil fungi), my research field
>>> indeed, and all uses NMDS or DCA as ordination techniques... So, I am
>>>       
>> only
>>     
>>> trying to do my best useing R for calculating them...
>>>       
>> Would you walk in front of a bus if you saw lots of other people doing
>> it? I doubt it. This kind of "me to" attitude to science is quite
>> demoralising when reviewing manuscripts and reading the literature.
>>
>> DCA was invented to solve a specific problem with CA - namely the arch
>> artefact. I forget whether this is in Jari's public lecture notes, vegan
>> vignettes/tutorials or in one of his lectures on a course we taught
>> together, but DCA replaces the arch artefact with other artefacts that
>> make the points look like a trumpet or a diamond in ordination space.
>>
>> Why DCA is used as a default instead of a special case escapes me. You
>> really shouldn't use DCA at all if you can get away with it as it is
>> doing some nasty things to your data.
>>
>> Alternatives; i) NMDS ii) PCA after application of a transformation
>> (Legendre & Gallagher 2001, Oecologia). And there are probably others...
>>
>>     
>>> What I need now is a good environmental interpretation of my work...
>>>
>>> Then I found the fantastic Jari's pdf about "Multivariate Analysis of
>>> Ecological Communities in R: vegan tutorial" and I went to the passage
>>>       
>> about
>>     
>>> factors and vectors fitting...
>>> That's my R code:
>>>
>>>       
>>>> NMS.ABCDsqrt
>>>>         
>>> Call:
>>> metaMDS(comm = sqrtABCD, distance = "bray", k = 4, trymax = 100,
>>> autotransform = F)
>>>
>>> Nonmetric Multidimensional Scaling using isoMDS (MASS package)
>>>
>>> Data:     sqrtABCD
>>> Distance: bray shortest
>>>
>>> Dimensions: 4
>>> Stress:     11.68632
>>>       
>> What was the stress with k = 2 and k = 3. As Jari has already mentioned,
>> how are you going to interpret and visualise this 4D configuration of
>> points (you can't plot NMDS1 vs NMDS2, NMDS1 vs NMDS3 etc. for reasons
>> explained to you earlier in this thread).
>>
>>     
>>> Two convergent solutions found after 2 tries
>>> Scaling: centring, PC rotation, halfchange scaling
>>> Species: expanded scores based on sqrtABCD
>>>
>>>       
>>>> envfit(NMS.ABCDsqrt, env.table, permu=1000) ->NMS.ABCDsqrtef
>>>> NMS.ABCDsqrtef
>>>>         
>>> ***FACTORS:
>>>
>>> Centroids:
>>>               NMDS1   NMDS2   NMDS3   NMDS4
>>> CommunityA  -0.3821  0.3822 -0.1173 -0.1232
>>> CommunityB  -0.1849  0.2748  0.0076 -0.0720
>>> CommunityC   0.2206 -0.4261 -0.0505  0.1197
>>> CommunityD   0.3465 -0.2310  0.1603  0.0756
>>> HostCorylus -0.2835  0.3285 -0.0549 -0.0976
>>> HostOstrya   0.2835 -0.3285  0.0549  0.0976
>>>
>>> Goodness of fit:
>>>               r2   Pr(>r)
>>> Community 0.2009 0.001998 **
>>> Host      0.1818 0.000999 ***
>>> ---
>>> Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
>>> P values based on 1000 permutations.
>>>
>>>       
>>>> names(NMS.ABCDsqrtef)
>>>>         
>>> [1] "vectors" "factors"
>>>       
>>>> NMS.ABCDsqrtef$vectors
>>>>         
>>> NULL
>>>       
>>> I have an enviromental matrix (env.table) that contains only area
>>>       
>> (A,B,C,D)
>>     
>>> and tree species (Corylus sp. and Ostrya sp. ) differentiation, so I have
>>> the area and the tree species of each samples is related to...
>>> I could plot factors on the graph but not vectors because there aren't
>>> vectors in reason to the absence of numerical data in the env. matrix...
>>> isn't it right? Aren't the R2 values too low?
>>>       
>> Did you read ?envfit ? It states:
>> ....
>>     (r^2). For factors this is defined as r^2 = 1 - ss_w/ss_t, where
>>     ss_w and ss_t are within-group and total sums of squares.
>>
>> So this statistic here is looking at how constrained within the 4D space
>> the levels of each factor are in relation to the overall spread of the
>> points. This looks to me like some evidence for grouping of your sites
>> on basis of Community and stronger evidence for Host. The "effect" is
>> small but significant. I do wonder if you are not hitting the curse of
>> dimensionality here?
>>
>> The interpretation will depend on the number of samples. It would be
>> nice to look at the ordination but how you do that I don't know.
>>
>> G
>>
>>     
>>> Many many thank you for answering...
>>>
>>> Gian
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>       
>>>> Jari,
>>>>         
>>>>> I am here again ... :)
>>>>> So, to try having a comparison of the real goodness of my metaMDS
>>>>>           
>> data I
>>     
>>>>> tried to perform a DCA (with same input table)
>>>>> Then please forgive me if I do somethign wrong with it... That's my R
>>>>>           
>>>> code:
>>>>
>>>> Why DCA? What lead you to torture your data so?
>>>>
>>>>         
>>>>>> decorana(sqrtABCD, iweigh=0, ira=0) -> DCA.1
>>>>>> DCA.1
>>>>>>             
>>>>> Call:
>>>>> decorana(veg = sqrtABCD, iweigh = 0, ira = 0)
>>>>>
>>>>> Detrended correspondence analysis with 26 segments.
>>>>> Rescaling of axes with 4 iterations.
>>>>>
>>>>>                   DCA1   DCA2   DCA3   DCA4
>>>>> Eigenvalues     0.6688 0.5387 0.4822 0.3752
>>>>> Decorana values 0.7912 0.5795 0.4145 0.2931
>>>>> Axis lengths    5.9974 3.7036 3.6121 3.3802
>>>>>
>>>>>           
>>>>> In that situation the graph is still good but the differences between
>>>>>           
>> the
>>     
>>>>> two clades are little more confused, maybe in the axe II (I mean the
>>>>> vertical one) in this case there is a better separation.
>>>>> What do the "Decorana values" really mean?
>>>>>           
>>>> ?decorana
>>>>
>>>> Basically, in the original DECORANA code the Eigenvalues reported were
>>>> computed at the wrong stage of the "detrending" processes. Jari
>>>>         
>> realised
>>     
>>>> this when interfacing the old DECORANA code with R. Jari altered the
>>>> code to compute the correct Eigenvalues, but chose to also report the
>>>> values you'd get from DECORANA or Canoco to stop people complaining
>>>>         
>> that
>>     
>>>> vegan was doing DCA incorrectly.
>>>>
>>>>         
>>>>>  And how about the segments?
>>>>>           
>>>> What about them? Do you know how DCA works? The standard detrending
>>>> breaks the first (D)CA axis into 26 sequential chunks or segements. the
>>>> 26 is the default, but it can be changed. Within each chunk, the mean
>>>> trial site score for axis 2 for sites in that chunk is subtracted from
>>>> the trial axis 2 site scores of the sites in the chunk. This detrending
>>>> is what gets rid of the "arch" found in some CA plots and is the reason
>>>> DCA was invented.
>>>>
>>>>         
>>>>> How can I do something better?
>>>>>           
>>>> Are you trying to separate the two clades? Do you know a priori which
>>>> samples belong to which clade? If so, one of the many classification
>>>> methods in R would be more useful as they look to separate the a priori
>>>> defined groups best. The methods you have been using thus far aim to
>>>> represent the dissimilarities between samples best in a low dimensional
>>>> space.
>>>>
>>>> HTH
>>>>
>>>> G
>>>>
>>>>         
>>>>> Many thank you in advance,
>>>>>
>>>>> G.
>>>>>           
>>>       [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-ecology mailing list
>>> R-sig-ecology at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>>       
>> --
>> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>>  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
>>  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
>>  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
>>  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/<http://www.ucl.ac.uk/%7Eucfagls/>
>>  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
>> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>>
>>
>>
>>     
>
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>   

-- 
Dr. Gabriel Singer
Department of Freshwater Ecology - University of Vienna
and Wassercluster Lunz Biologische Station GmbH
+43-(0)664-1266747
gabriel.singer at univie.ac.at



More information about the R-sig-ecology mailing list