[R] mixed models question

Tue Jun 15 03:28:54 CEST 2004

I am trying to fit the following linear model to logged per capita 
fecundity data (ie number of babies per female) for a mouse:

RsNRlS <- glm(formula = ln.fecundity ~ summer.rainfall + N + 
lagged.rainfall + season, ....)

I am using this relationship in a simulation model, and the current 
statistical model I have fit is unsatisfactory.  The problem is I get a 
global estimate of variance (MSE), but I think it varies across subsets 
of the data.  Specifically, seasons when there is lots of reproduction 
(e.g. fall) tend to have high variance, while seasons with little 
reproduction (e.g. summer) have small amounts of variance.  I am 
looking for a method for estimating the coefficients in my linear 
model, and estimating a separate error for subsets of the data (ie for 
each of the 4 seasons).  The end goal is to take this linear model back 
into my simulation model to make predictions about fecundity, but with 
separate variance terms for subsets of the data.

I thought of several solutions, but some seem pretty ad hoc, while I 
think others would suffer from loss of a lot of power due to small 
sample size.  The solutions I thought of were:
1)	Fit the linear model to all data, but then estimate variances by 
season directly from the residuals.  Seems ad hoc given the assumption 
of common variance in the initial fitting.
2)	Use regression weights by season to reweight the data points/fitted 
values.  Then in the simulation model inflate/deflate the estimate of 
MSE by these weights - seems ad hoc too, as translating weights (which 
would presumably basically downgrade the importance of outliers) into 
something meaningful for prediction using the regression equation seems 
arbitrary.
3)	Fit a separate model for each season, either assume the global model 
(but maybe get poor fits due to #parameter v. #data points) or try 
reducing models and choosing the best for each season using AIC.  The 
problem with this is that I think that some parameters have effects 
common to all seasons - eg lagged rainfall represents the effect of 
rainfall drowning babies in their burrows prior to the age at which 
they can be captured.  This is pretty clearly a strong effect, just 
based on graphing the data, so I am uncertain about loosing power by 
splitting it by season.
4)	Fit a mixed model to all the data, treating each season as a random 
factor with levels 1 or 0, and estimating the variance for each.  Again 
this seems somewhat ad hoc, although based on my reading in elementary 
texts it would allow me to estimate the variance component for each of 
the seasons.

Based on talking to a colleague in my department some of the modern 
methods for fitting mixed models, in which the variance for each of the 
levels of a factor in a random effect can be estimated.  That seems the 
best option, as it explicitly does what I want.  However, it is well 
over my head in terms of either the R code or the statistical 
background.  Can anyone suggest a good approach, or a published 
example, or some useful reading on how one might approach this problem?

Chris Wilcox
The Ecology Centre
Department of Zoology and Entomology
Brisbane, 4072  QLD
Australia
tel  61-7-3365-1686
fax 61-7-3365-1655
website www.uq.edu.au/~uqcwilco