[R-sig-eco] R-sig-ecology Digest, Vol 23, Issue 2

Highland Statistics Ltd. highstat at highstat.com
Tue Feb 2 12:12:43 CET 2010


1. Low counts (Tore Chr Michaelsen)
>     2. Re: Low counts (Miltinho Astronauta)
>     3. Re: Low counts (Maarten de Groot)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 1 Feb 2010 15:04:22 +0100
> From: "Tore Chr Michaelsen"<tore.michaelsen at bio.uib.no>
> To:<r-sig-ecology at r-project.org>
> Subject: [R-sig-eco] Low counts
> Message-ID:<000801caa347$74c62a70$5e527f50$@michaelsen at bio.uib.no>
> Content-Type: text/plain;	charset="us-ascii"
>
> Dear members;
>
> 1) I have fitted a glm to count data (using quasipoisson to correct for
> disp.). In the final model, the relationship between Res and Fitted (i.e.
> the line going through the plot) and QQ looks fine, but I am worried that
> low count (one to five) could violate some assumption of the glm/poisson:
> Although the line in the Res vs Fitted plot looks nice, the values show a
> clear pattern (five diagonal lines = the counts). Crawley/R book says it
> should look like the sky at night with no patterns. I assume patterns are
> not visible with large counts (e.g. 0-100), but highly visible with low
> counts as in this case. I still assume this is reason for some concern about
> the model, or is the concern not justified?
>
>    

Tore,

I'm actually trying to write a paper on exactly the same (low numbers) 
problem. But it doesn't go very fast. The first thing you have to ask 
yourself is whether the fact that there are no zeros is because you 
cannot have zeros...or is it just by chance? In the first case, consider 
zero truncated GLMs. The problem that I face myself with clutch size 
data with values between 1 and 5 is underdispersion. 
Hence....underdispersed zero truncated GLMs. And that brings you to 
generalized Poisson GLMs. Yes....there is always more shit. Now...I 
noticed that the zero truncation is not a real problem (i.e. similar 
SEs) as long as the fitted values are around 4 or 5 (or higher). In the 
snake carcasses data in  Chapter 11 of our mixed modelling book, the 
mean was between 1 and 2..and in that case differences between SEs of 
Poisson GLM and trunctated Poisson GLMs were about a factor 3.

As to your diagonal lines...those are due to your discrete values... In 
fact...those "lines" are always present..also in linear regression..but 
then you don't notice them. The extreme case is binary data.

So...summarising...think first about truncation....then check for 
underdispersion because you have a small range of observed values.

Alain





> 2) Any recommendations on literature regarding model inspection in R.
>
> Thank you for reading this mail!
>
> Best wishes;
> Tore
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 1 Feb 2010 12:54:46 -0500
> From: Miltinho Astronauta<milton.reco at gmail.com>
> To: Tore Chr Michaelsen<tore.michaelsen at bio.uib.no>
> Cc: r-sig-ecology at r-project.org
> Subject: Re: [R-sig-eco] Low counts
> Message-ID:
> 	<30c7555b1002010954p76f125f4jf07fa121936886e0 at mail.gmail.com>
> Content-Type: text/plain
>
> Hi Tore,
>
> I put my 2cents on Zuur et al 2009's book - Mixed effect models...
> See Zero-Inflated examples in there.
>
> cheers
>
> milton
>
> 2010/2/1 Tore Chr Michaelsen<tore.michaelsen at bio.uib.no>
>
>    
>> Dear members;
>>
>> 1) I have fitted a glm to count data (using quasipoisson to correct for
>> disp.). In the final model, the relationship between Res and Fitted (i.e.
>> the line going through the plot) and QQ looks fine, but I am worried that
>> low count (one to five) could violate some assumption of the glm/poisson:
>> Although the line in the Res vs Fitted plot looks nice, the values show a
>> clear pattern (five diagonal lines = the counts). Crawley/R book says it
>> should look like the sky at night with no patterns. I assume patterns are
>> not visible with large counts (e.g. 0-100), but highly visible with low
>> counts as in this case. I still assume this is reason for some concern
>> about
>> the model, or is the concern not justified?
>>
>> 2) Any recommendations on literature regarding model inspection in R.
>>
>> Thank you for reading this mail!
>>
>> Best wishes;
>> Tore
>>
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>
>>      
> 	[[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 02 Feb 2010 08:06:46 +0100
> From: Maarten de Groot<Maarten.deGroot at nib.si>
> To: Miltinho Astronauta<milton.reco at gmail.com>
> Cc: r-sig-ecology at r-project.org
> Subject: Re: [R-sig-eco] Low counts
> Message-ID:<4B67CF06.4010905 at nib.si>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi Tore,
>
> How does your count distribution look like? Doe you have more zero's
> than expected (use zero inflated models), no zero's (use zero truncated
> models) or is there no problem with zero's? If it is the latter, it
> might be useful to try the negative binomial models (glm.nb()). Zuur et
> al (2009) gives a nice example that they still find a pattern in the
> residuals with a quasi poison model but no pattern with a negative
> binomial model.
>
> Kind regards,
>
> Maarten
>
> Miltinho Astronauta wrote:
>    
>> Hi Tore,
>>
>> I put my 2cents on Zuur et al 2009's book - Mixed effect models...
>> See Zero-Inflated examples in there.
>>
>> cheers
>>
>> milton
>>
>> 2010/2/1 Tore Chr Michaelsen<tore.michaelsen at bio.uib.no>
>>
>>
>>      
>>> Dear members;
>>>
>>> 1) I have fitted a glm to count data (using quasipoisson to correct for
>>> disp.). In the final model, the relationship between Res and Fitted (i.e.
>>> the line going through the plot) and QQ looks fine, but I am worried that
>>> low count (one to five) could violate some assumption of the glm/poisson:
>>> Although the line in the Res vs Fitted plot looks nice, the values show a
>>> clear pattern (five diagonal lines = the counts). Crawley/R book says it
>>> should look like the sky at night with no patterns. I assume patterns are
>>> not visible with large counts (e.g. 0-100), but highly visible with low
>>> counts as in this case. I still assume this is reason for some concern
>>> about
>>> the model, or is the concern not justified?
>>>
>>> 2) Any recommendations on literature regarding model inspection in R.
>>>
>>> Thank you for reading this mail!
>>>
>>> Best wishes;
>>> Tore
>>>
>>> _______________________________________________
>>> R-sig-ecology mailing list
>>> R-sig-ecology at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>>
>>>
>>>        
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>
>>
>>      
>
>
> ------------------------------
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>
> End of R-sig-ecology Digest, Vol 23, Issue 2
> ********************************************
>
>    


-- 


Dr. Alain F. Zuur
First author of:

1. Analysing Ecological Data (2007).
Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p.
URL: www.springer.com/0-387-45967-7


2. Mixed effects models and extensions in ecology with R. (2009).
Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer.
http://www.springer.com/life+sci/ecology/book/978-0-387-87457-9


3. A Beginner's Guide to R (2009).
Zuur, AF, Ieno, EN, Meesters, EHWG. Springer
http://www.springer.com/statistics/computational/book/978-0-387-93836-3


Other books: http://www.highstat.com/books.htm


Statistical consultancy, courses, data analysis and software
Highland Statistics Ltd.
6 Laverock road
UK - AB41 6FN Newburgh
Tel: 0044 1358 788177
Email: highstat at highstat.com
URL: www.highstat.com
URL: www.brodgar.com



More information about the R-sig-ecology mailing list