[R-sig-eco] interpreting ecological distance approaches (Bray Curtis after various data transformation)

David Warton d@v|d@w@rton @end|ng |rom un@w@edu@@u
Fri Apr 5 08:59:48 CEST 2019


Hi Torsten et al,
Yes gjam could also be considered, it poses a statistical model, and has some capacity to handle varying sampling intensity (“effort”).  It is a very different type of model, but there are plenty of tools for model selection and model checking that could be used to inform on model choice.

Nestedness/richness differences – yes this can be naturally handled via the mean model using row effects, most of the time.  And in relation to environmental predictors, you could readily partition their effects into “main effects” on richness vs “interactions” (with species) on turnover.  Most canned software doesn’t make this second step easy to do, I was thinking of writing that option into mvabund, but hey if someone else wants to take the lead on that then by all means…

All the best
David



From: Torsten Hauffe <torsten.hauffe using gmail.com>
Sent: Thursday, 4 April 2019 6:21 PM
To: David Warton <david.warton using unsw.edu.au>
Cc: r-sig-ecology using r-project.org
Subject: Re: [R-sig-eco] interpreting ecological distance approaches (Bray Curtis after various data transformation)

Great point David!

Since Tim was referring to microbial communities, the gjam package is similar to mvabund, boral etc. and the microbial example discussed in the following paper might be of interest.

https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/ecm.1241

With that being about R itself, I may go a bit off topic:
In all those multivariate GLM approaches, is there a way to disentangle richness differences (or nestedness) and turnover like we can do with pairwise distances?
(See the inspiring discussion between Carvalho et al. and Baselga et al.; summarized in http://onlinelibrary.wiley.com/doi/10.1111/geb.12207/abstract )
Since different biological processes may cause these patterns, separating richness differences and species turnover is of interest. Maybe the the row effect in those multivariate GLMs could be estimated as response to environmental predictors?

Cheers,
Torsten



On Thu, 4 Apr 2019 at 01:19, David Warton <david.warton using unsw.edu.au<mailto:david.warton using unsw.edu.au>> wrote:
Hi Tim,
Yes you are right this is an issue, BC (and other distance metrics) are sensitive to sampling intensity, which is often an artefact of the sampling technique.  Transformation is not a great solution to the problem - it works imperfectly and will have different effects depending on the properties of your data.  There are lots of different types of datasets out there, each with different properties, and different behaviours under different transformation/standardisation strategies, so there is no one-transformation-suits-all solution.  An illustration of this (in the case of row standardisation) is in the below paper:
        https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12843

The strategy I would advise here is to go a very different route and build a statistical model for the data.  You can then include row effects in the model to handle variation in sampling intensity across rows of data (along the lines of equation 2 of the above paper).  Or if the magnitude of the variation in sampling intensity is known (e.g. it is due to changes in sizes of quadrats used for sampling, and quadrat size has been recorded), then the standard approach to handle this is to add an offset to the model.  There is plenty of software out there that can fit suitable statistical models with row effects (and offsets) for this sort of data, including the mvabund, HMSC, boral, and gllvm packages on R.  Importantly, these packages come with diagnostic tools to check that the analysis approach adequately captures key properties of your data - an essential step in any analysis.

All the best
David


Professor David Warton
School of Mathematics and Statistics, Evolution & Ecology Research Centre, Centre for Ecosystem Science
UNSW Sydney
NSW 2052 AUSTRALIA
phone +61(2) 9385 7031
fax +61(2) 9385 7123

http://www.eco-stats.unsw.edu.au



----------------------------------------------------------------------

Date: Tue, 2 Apr 2019 17:15:45 +0200
From: Tim Richter-Heitmann <trichter using uni-bremen.de<mailto:trichter using uni-bremen.de>>
To: r-sig-ecology using r-project.org<mailto:r-sig-ecology using r-project.org>
Subject: [R-sig-eco] interpreting ecological distance approaches (Bray
        Curtis after various data transformation)
Message-ID: <3834fea1-040a-12b5-c3a3-633e68dc6ab5 using uni-bremen.de<mailto:3834fea1-040a-12b5-c3a3-633e68dc6ab5 using uni-bremen.de>>
Content-Type: text/plain; charset="utf-8"; Format="flowed"

Dear list,

i am not an ecologist by training, so please bear with me.

It is my understanding that Bray Curtis distances seem to be sensitive to different community sizes. Thus, they seem to deliver inadequate results when the different community sizes are the result of technical artifacts rather than biology (see e.g. Weiss et al, 2017 on microbiome data).

Therefore, i often see BC distances made on relative data (which seems to be equivalent to the Manhattan distance) or on data which has been subsampled to even sizes (e.g. rarefying). Sometimes i also see Bray Curtis distances calculated on Hellinger-transformed data,

which is the square root of relative data. This again makes sample sizes unequal (but only to a small degree), so i wondered if this is a valid approach, especially considering that the "natural" distance choice for Hellinger transformed data is Euclidean (to obtain, well, the Hellinger distance).

Another question is what different sizes (i.e. the sums) of Hellinger transformed  communities represent? I tested some datasets, and couldnt find a correlation between original sample sizes and their hellinger transformed counterparts.

Any advice is very much welcome. Thank you.

--
Dr. Tim Richter-Heitmann

University of Bremen
Microbial Ecophysiology Group (AG Friedrich)
FB02 - Biologie/Chemie
Leobener Straße (NW2 A2130)
D-28359 Bremen
Tel.: 0049(0)421 218-63062
Fax: 0049(0)421 218-63069



_______________________________________________
R-sig-ecology mailing list
R-sig-ecology using r-project.org<mailto:R-sig-ecology using r-project.org>
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

	[[alternative HTML version deleted]]



More information about the R-sig-ecology mailing list