[R] researcher with highly skewed data set seeks help finding practical GLMM tutorial

Tue Nov 30 10:59:42 CET 2010

On Tue, 30 Nov 2010, Ben Kenward wrote:

> Hi!
>
> I am a psychologist who suspects that the only sensible way to analyse
> a particular data set is to use generalised linear mixed models. I am
> hoping that someone might be able to point me in the right direction
> to find some very practical hands on documentation that might be able
> to talk me through actually doing such an analysis?
>
> So far in my searches the most useful document I have turned up is
> Bolker et al. (2008, TREE) Generalized linear mixed models: a
> practical guide for ecology and evolution. As a general guide it
> doesn't give enough practical information about how to get the job
> done. The R documentation is obviously practical, but doesn't help to
> decide what kind of analysis is appropriate. Apart from those sources
> I am mainly finding quite theoretical treatments going over my head,
> for example: http://www.cmm.bristol.ac.uk/learning-training/multilevel-m-software/reviewr.pdf.
>
> I am moderately competent programming in R, having coded custom
> permutation tests before (which in contrast to GLMM I find intutive).
>
> In case anyone is kind enough to give me any specific pointers, here
> is the nature of my data set. With an N of 42 subjects, I have a
> highly left skewed (about half the data points are zero) frequency
> variable as dependent variable. This variable is measured in each
> subject in three different task types. There is furthermore a context
> variable with two levels. Each task was administered in each context,
> but not for every single subject.
>
> So the design is quite simple - two fixed factors (task and context),
> one random factor (subject), and an untransformably skewed dependent
> variable. I might want to add some additional fixed factors (age
> group) in future but for now I would like to keep it simple. I guess
> this is straightforward for those in the know. Any help at all much
> appreciated!

Given that you have frequency data with many zeros, some zero-augmented 
count data model might be useful. For example a hurdle model or a 
zero-inflated Poisson or negative binomial model. Both lead often to 
similar fits but the hurdle model is typically easier to interpret. An 
overview using the "pscl" package is given in 
http://www.jstatsoft.org/v27/i08/

This implementation currently does not support random effects though. But 
for a start a hurdle() model with sandwich standard errors should be 
useful to find out whether this type of model is useful for your data.

If so, you might also want to have a look at the "gamlss" package that 
suports a somewhat different implementation of ZIP models but has random 
effects. See http://www.jstatsoft.org/v23/i07/

hth,
Z

> Cheers,
>
> Ben
>
> -- 
> Dr. Ben Kenward
> Department of Psychology, Uppsala University, Sweden
> http://www.benkenward.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>