[R-sig-ME] Unbalanced presence/absence data

Mon Feb 9 12:21:32 CET 2009

On 04/02/2009, at 1:22 AM, Renwick, A. R. wrote:

> I am trying to analyse some data I have on the presence/absence of  
> parasite infestation on small mammals using a GLMM, however I have a  
> severely unbalanced data set in that I have a large number of 0's  
> compared to 1's (i.e. 1333 0's and 86 1's).
>
> The response variable (presence/absence) is at the individual level  
> whereas all the explanatory variables (apart from sex) are at the  
> site level.  This means that a lot of the individuals have exactly  
> the same combination of all explanatory variables and when there is  
> so many individuals with 0's it leaves very little power.
>

This shouldn't be a problem, what you may need is to use the nAGQ  
parameter to increase the number of quadrature points, and avoid any  
numerical problems. This is especially important if there is high  
correlation between individuals within a site. Also unbalanced means  
something different to what you have.

> When I reduce the model I find that I can remove a number of  
> interactions terms without really affecting the AIC which lead me to  
> be slightly concerned.
>

This most likely means the interactions are not significant.

> One option would be to analyses the data at the site level, i.e  
> parasite prevalence, rather than the probability of being infested.
>

While you can do this, it is throwing away information, possibly a lot  
of information.

Ken

> Any advice as to how to deal with this unbalanced data set would be  
> very much appreciated.
>
> Anna Renwick
> Institute of Biological & Environment Sciences
> University of Aberdeen
> Zoology Building
> Tillydrone Avenue
> Aberdeen
> AB24 2TZ
>
>
> The University of Aberdeen is a charity registered in Scotland, No  
> SC013683.
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>