[R-sig-ME] Unbalanced presence/absence data

Renwick, A. R. a.renwick at abdn.ac.uk
Tue Feb 3 15:22:22 CET 2009


I am trying to analyse some data I have on the presence/absence of parasite infestation on small mammals using a GLMM, however I have a severely unbalanced data set in that I have a large number of 0's compared to 1's (i.e. 1333 0's and 86 1's).

The response variable (presence/absence) is at the individual level whereas all the explanatory variables (apart from sex) are at the site level.  This means that a lot of the individuals have exactly the same combination of all explanatory variables and when there is so many individuals with 0's it leaves very little power.

When I reduce the model I find that I can remove a number of interactions terms without really affecting the AIC which lead me to be slightly concerned.

One option would be to analyses the data at the site level, i.e parasite prevalence, rather than the probability of being infested.

Any advice as to how to deal with this unbalanced data set would be very much appreciated.

Anna Renwick
Institute of Biological & Environment Sciences
University of Aberdeen
Zoology Building
Tillydrone Avenue
Aberdeen
AB24 2TZ


The University of Aberdeen is a charity registered in Scotland, No SC013683.




More information about the R-sig-mixed-models mailing list