[R-sig-eco] transformation of percent coverage data in

David Warton david.warton at unsw.edu.au
Thu Feb 19 00:02:21 CET 2015


Hi Marc,

Transforming community data to try and work around problems is a dangerous game - often there are lots of zeros in the data (and other near-boundary values) which generate structure in the data that cannot be transformed away.  I know lots of people try to do this, but it can be highly problematic, I never found it too hard to show up this method in simulation.



The alternative is to formulate a statistical model for the percent cover data.  This is quite tricky, you want a continuous distribution on [0,1] with a big fat mass at zero, I for one am not aware of a good answer.  I think a neat solution to this problem would make a really useful contribution to the multivariate ecology literature.  In the meantime, if the values don't get too close to one you could maybe try a Tweedie distribution.  Or some zero-inflated distribution, to split out the problems of modelling zeros from modelling the non-zeros.  Another approach is to truncate the data to an ordinal scale (Braun-Blanquet sort of thing) and analyse it as ordinal, e.g. using proportional odds.  You mentioned going all the way to presence-absence (which would mean logistic regression, or actually maybe a complementary log-log link) but there is some middle ground.  The idea of truncating the data might make some sense if it was originally collected as guestimates, veg-style "about 50%", "maybe 25%" stuff, but if the values were measured carefully, it would be worthwhile trying to be more careful in the analysis too.



All the best

David



David Warton
Professor and Australian Research Council Future Fellow
School of Mathematics and Statistics and the Evolution & Ecology Research Centre
The University of New South Wales NSW 2052 AUSTRALIA
phone (61)(2) 9385-7031
fax (61)(2) 9385-7123

http://www.eco-stats.unsw.edu.au/ecostats15.html





-----Original Message-----


Date: Tue, 17 Feb 2015 21:29:35 +0100

From: Marc Taylor <marchtaylor at gmail.com<mailto:marchtaylor at gmail.com>>

To: "r-sig-ecology at r-project.org<mailto:r-sig-ecology at r-project.org>" <r-sig-ecology at r-project.org<mailto:r-sig-ecology at r-project.org>>

Subject: [R-sig-eco] transformation of percent coverage data in

                community         analysis

Message-ID:

                <CACg2Sf09x_euLnv7Gx_HSQ1Fcj+GLy6wwZg6eRKqUsvQKNUhiA at mail.gmail.com<mailto:CACg2Sf09x_euLnv7Gx_HSQ1Fcj+GLy6wwZg6eRKqUsvQKNUhiA at mail.gmail.com>>

Content-Type: text/plain; charset="UTF-8"



Dear all,



A colleague of mine is doing some basic descriptive analyses of algal

community data (e.g. MDS, PCA), and we got to talking about how his data

(percent coverage) might be transformed before the calculation of distance

matrices. This may be desirable in order to reduce the influence of

dominant species in the analyses.



One extreme would be to convert the percent coverage matrices to

presence/absence, which would essentially remove all of the intra-sample

weighting, and concentrate on larger spatial presence patterns. I was

wondering if there were any other options that would make sense for such

data?



Thanks in advance for your advise,

Marc



	[[alternative HTML version deleted]]



More information about the R-sig-ecology mailing list