[R-sig-eco] Continuous (Non-Count) Skewed Data With Many Zeros

Mieke Zwart m.c.zwart at newcastle.ac.uk
Thu May 17 12:22:52 CEST 2012


Hi Alain,

Just wondering if the 2012 book you are talking about is the "Zero Inflated Models and Generalized Linear Mixed Models with R". You say to look at chapter 10 but this book has only 9 chapters according to the website. Is it the book you are talking about?

Cheers,

Mieke

 
-----Original Message-----
From: r-sig-ecology-bounces at r-project.org [mailto:r-sig-ecology-bounces at r-project.org] On Behalf Of Highland Statistics Ltd
Sent: 16 May 2012 14:09
To: r-sig-ecology at r-project.org
Subject: Re: [R-sig-eco] Continuous (Non-Count) Skewed Data With Many Zeros


>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 15 May 2012 11:15:39 -0700 (PDT)
> From: Rich Shepard<rshepard at appl-ecosys.com>
> To: r-sig-ecology at r-project.org
> Subject: [R-sig-eco] Continuous (Non-Count) Skewed Data With Many
> 	Zeros
> Message-ID:<alpine.LNX.2.00.1205151057550.3824 at salmo.appl-ecosys.com>
> Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
>
>     The water chemistry data of metal concentrations are not normally 
> distributed (based on Q-Q plots) and are not improved by 
> transformation (log10, sqrt, cubic root).

Non-normality of your response variable is not a reason to apply a data transformation.
>   For the 30 metal species the percentage of zeros ranges from none 
> (10 metals) to 48.6; average 5.6. Most metals are at very low 
> concentrations with infrequent spikes which might be very high.
>
>     Those with fewer zeros are not a concern, but I'd like your 
> thoughts on 1) at what percentage do the number of zeros become a 
> concern
It all depends, and no sensible answer can be given. 15% of zeros can screw things up....but it is also possible that 80% of zeros comply with a regression or GLM. For a discussion with examples see Chapter 10 in our 2012 book.

> and 2) how to
> characterize and model these data.

Depends on the previous remark.....anything from linear regression to a zero inflated model for a continuous distributed response variable. 
There is just no simple answer possible. It all depends. But based on what you describe it will probably be something zero-inflated.

Alain





-- 

Dr. Alain F. Zuur
First author of:

1. Analysing Ecological Data (2007).
Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p.
URL: www.springer.com/0-387-45967-7


2. Mixed effects models and extensions in ecology with R. (2009).
Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer.
http://www.springer.com/life+sci/ecology/book/978-0-387-87457-9


3. A Beginner's Guide to R (2009).
Zuur, AF, Ieno, EN, Meesters, EHWG. Springer
http://www.springer.com/statistics/computational/book/978-0-387-93836-3


4. Zero Inflated Models and Generalized Linear Mixed Models with R. (2012) Zuur, Saveliev, Ieno.
http://www.highstat.com/book4.htm

Other books: http://www.highstat.com/books.htm


Statistical consultancy, courses, data analysis and software Highland Statistics Ltd.
6 Laverock road
UK - AB41 6FN Newburgh
Tel: 0044 1358 788177
Email: highstat at highstat.com
URL: www.highstat.com
URL: www.brodgar.com

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



More information about the R-sig-ecology mailing list