[R-sig-eco] Continuous (Non-Count) Skewed Data With Many Zeros

Highland Statistics Ltd highstat at highstat.com
Wed May 16 15:09:11 CEST 2012


>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 15 May 2012 11:15:39 -0700 (PDT)
> From: Rich Shepard<rshepard at appl-ecosys.com>
> To: r-sig-ecology at r-project.org
> Subject: [R-sig-eco] Continuous (Non-Count) Skewed Data With Many
> 	Zeros
> Message-ID:<alpine.LNX.2.00.1205151057550.3824 at salmo.appl-ecosys.com>
> Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
>
>     The water chemistry data of metal concentrations are not normally
> distributed (based on Q-Q plots) and are not improved by transformation
> (log10, sqrt, cubic root).

Non-normality of your response variable is not a reason to apply a data 
transformation.
>   For the 30 metal species the percentage of zeros
> ranges from none (10 metals) to 48.6; average 5.6. Most metals are at very
> low concentrations with infrequent spikes which might be very high.
>
>     Those with fewer zeros are not a concern, but I'd like your thoughts on 1)
> at what percentage do the number of zeros become a concern
It all depends, and no sensible answer can be given. 15% of zeros can 
screw things up....but it is also possible that 80% of zeros comply with 
a regression or GLM. For a discussion with examples see Chapter 10 in 
our 2012 book.

> and 2) how to
> characterize and model these data.

Depends on the previous remark.....anything from linear regression to a 
zero inflated model for a continuous distributed response variable. 
There is just no simple answer possible. It all depends. But based on 
what you describe it will probably be something zero-inflated.

Alain





-- 

Dr. Alain F. Zuur
First author of:

1. Analysing Ecological Data (2007).
Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p.
URL: www.springer.com/0-387-45967-7


2. Mixed effects models and extensions in ecology with R. (2009).
Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer.
http://www.springer.com/life+sci/ecology/book/978-0-387-87457-9


3. A Beginner's Guide to R (2009).
Zuur, AF, Ieno, EN, Meesters, EHWG. Springer
http://www.springer.com/statistics/computational/book/978-0-387-93836-3


4. Zero Inflated Models and Generalized Linear Mixed Models with R. (2012) Zuur, Saveliev, Ieno.
http://www.highstat.com/book4.htm

Other books: http://www.highstat.com/books.htm


Statistical consultancy, courses, data analysis and software
Highland Statistics Ltd.
6 Laverock road
UK - AB41 6FN Newburgh
Tel: 0044 1358 788177
Email: highstat at highstat.com
URL: www.highstat.com
URL: www.brodgar.com



More information about the R-sig-ecology mailing list