[R-sig-ME] model definition issues for repeat measures

CL Pressland Kate.Pressland at bristol.ac.uk
Thu Apr 16 19:28:38 CEST 2009

I've scrolled through the archives and CRAN help pages and can't find an 
answer for my query: my apologies if it is rather basic.

I have a data set that is nested and unbalanced consisting of:

67 SITEs measured over several YEARs every WEEK (April-Sept) for a group of 
insects (SP per m - continuous data). Not all sites have all WEEKs 
recorded. I'm interested in the MANagement code (categorical: coded 0,1 or 
2) assigned to each site, but I have also data on TEMPerature, average SUN 
and WIND (some missing data with weather variables though). When looking at 
my SPecies data histogram it's clearly poisson distributed (common amongst 
count data), although it is in decimals. Over the WEEKs 1 to 26 there is a 
gaussian-esque distribution where species numbers peak uniformly during 
July. All YEARs show this trend.

My random factors are SITE, YEAR and WEEK as they denote the structure of 
the data, but I am not interested in their effects per se, I just want to 
explain the data structure. I figure that I cannot assume a normal model, 
so have been looking into lme4 and including family=poisson. Would this be 
correct? I also think there may be a need to include some kind of 
correlation function into the WEEK part of the equation (as week 2 might be 
dependent on week 1 but independent of week 20 and so on) but I am unsure 
how to do this or how necessary it really is.

My preliminary model looks as such:

	model.a<-lmer(SP ~ MAN + (1|SITE/YEAR/WEEK), data=ALL, family=poisson)

I get confused as to how to organise the random effects. My understanding 
is that to the left of the | you are looking at the slope, to the right of 
the | you are looking at the intercept. Is this correct? Should my random 
effects be (YEAR/WEEK|SITE) instead? When I run the above model I get this 

	Error: length(f1) == length(f2) is not TRUE
	In addition: There are 50 or more warnings ...

What does this mean?

So, my key questions are:

1. What is the most appropriate random structure for my repeat measures?
2. Must I include a time series correlation structure for WEEKs? Which 
corStruct is most appropriate and how would I write that in for WEEK and 
not YEAR in the random effect section of the formula?
3. Should this model be family=poisson? Does that error message mean I have 
made an error with determining the data as poisson distributed?
4. Is it best to use ML or REML with unbalanced data if I'm going to create 
a model set for model selection by adding the weather variables as fixed 
effects? My hunch says ML if considering AIC/BIC values, but is REML more 
important due to the unbalanced data?

I've tried reading Pinheiro and Bates with the pages available through 
google books (our inadequate university libraries do not have a copy and 
have their spending account frozen due to recession!) which has helped and 
made this lowly student more confused simultaneously! Please can someone 
point me in the right direction?

Thank you for your time.

Kate.Pressland at bristol.ac.uk

More information about the R-sig-mixed-models mailing list