[R-sig-ME] Difficulty with GLMM specification in lme4

Wed Jul 28 23:04:55 CEST 2010

Hello list,

I think I've exhausted my (limited) knowledge of GLMMs in attempting to analyze a for a subset of my research.  I've been immersed in the R-help archives and recent publications for the past two weeks, and I've finally decided to solicit help from this distinguished list.  This is my first post, and I've read the appropriate posting materials, but I hope you'll forgive should I stray from those recommendations at any point.

First, some background...

The objective: Examine the fixed effect of treatment (binary) and geography (binary) on the consumption of fruits by avian frugivores over 14 count periods.  Specifically, I'd like to generate odds of consumption for each count period, and determine whether the odds vary between treatments or geographic locations (either as a consistent main effect or via an interaction with count period).  I'm most concerned with marginal effects (which I know GLMM is not giving me), but GEE model selection strike me as even less fun...and the results from similar GEE and GLMM models on similar datasets have given me comparable qualitative results.

The design:  17 plots, each comprising two subplots.  Treatment levels (e.g., control or treatment) were assigned randomly to the subplots within a plot.  

The response:  On each subplot, I monitored the fate (consumed or not) of fruits during 14 count periods.  The number of fruits initially present and monitored on a given subplot varied (min = 71, max = 461, total = 7572). 

At present, I am modeling consumption (yes or no) as a function of each count period (this very general time specification is likely to be appropriate), the treatment effect, and the geographic location of the plots (north vs. south).  In the dataset "fruit", each individual fruit (id) has up to 14 records, depending on when or if it was consumed.  For the 7572 individuals, this results in 59779 records:  

> str(fruit)
'data.frame':   59779 obs. of  6 variables:
 $ id        : Factor w/ 7572 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ plot      : Factor w/ 17 levels "N-10","N-2","N-3",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ trt       : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ geog      : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
 $ count     : Factor w/ 14 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ cons      : int  1 1 0 0 0 0 0 0 0 0 ...
 $ available : int  260 260 260 260 260 260 260 260 260 260 ...

Is this dataset format unnecessary?  Can I not simply sum events in each count period in each subplot, such that my response variable takes the form of cbind(cons, available - cons), or does this violate the binomial assumption that events are independent?

The answer to these questions would seem to dictate the appropriate specification of the random effects.  If I leave the dataset as is, I'm not sure what specification of the random effects is most appropriate. Are there not three random effects that need to be modeled?  I recall that when Judith Singer and John Willett (Applied Longitudinal Data Analysis, 2003) set up data like this, they ignore the apparent repeated measure on individuals...

The most relevant glmer model that I have tried using the original data format is: 

m_orig <- glmer(cons ~ count + geog + trt + count:geog + count:trt  + (1 | plot) + (1 | plot/trt) + (1 | id), data=fruit, family=binomial)

As I said, this model, and those dropping the (1 | id) term, churn for hours; I've yet to see it converge (or do anything).

The alternative dataset actually converges quickly.  For example, after creating a new dataset that sums consumption events (num_cons) within each count time period (not shown):

m_alt <- glmer(cbind(num_cons, available-num_cons) ~ count + geog + trt + count:geog + count:trt  + (1 | plot) + (1 | plot/trt), data=fruit_alt, family=binomial)

Which specification is more appropriate?  Am I way off base with this?  Since I'm interested in marginal effects, perhaps my objectives would be better addressed with the GEE approach?

Any assistance is greatly appreciated.

Best,

Adam Smith
Ph.D. Candidate, Avian Ecology
Dept. Natural Resources Science
University of Rhode Island

_________________________________________________________________
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.

ID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4