[R-sig-ME] model definition issues for repeat measures
CL Pressland
Kate.Pressland at bristol.ac.uk
Thu Apr 16 19:28:38 CEST 2009
I've scrolled through the archives and CRAN help pages and can't find an
answer for my query: my apologies if it is rather basic.
I have a data set that is nested and unbalanced consisting of:
67 SITEs measured over several YEARs every WEEK (April-Sept) for a group of
insects (SP per m - continuous data). Not all sites have all WEEKs
recorded. I'm interested in the MANagement code (categorical: coded 0,1 or
2) assigned to each site, but I have also data on TEMPerature, average SUN
and WIND (some missing data with weather variables though). When looking at
my SPecies data histogram it's clearly poisson distributed (common amongst
count data), although it is in decimals. Over the WEEKs 1 to 26 there is a
gaussian-esque distribution where species numbers peak uniformly during
July. All YEARs show this trend.
My random factors are SITE, YEAR and WEEK as they denote the structure of
the data, but I am not interested in their effects per se, I just want to
explain the data structure. I figure that I cannot assume a normal model,
so have been looking into lme4 and including family=poisson. Would this be
correct? I also think there may be a need to include some kind of
correlation function into the WEEK part of the equation (as week 2 might be
dependent on week 1 but independent of week 20 and so on) but I am unsure
how to do this or how necessary it really is.
My preliminary model looks as such:
model.a<-lmer(SP ~ MAN + (1|SITE/YEAR/WEEK), data=ALL, family=poisson)
I get confused as to how to organise the random effects. My understanding
is that to the left of the | you are looking at the slope, to the right of
the | you are looking at the intercept. Is this correct? Should my random
effects be (YEAR/WEEK|SITE) instead? When I run the above model I get this
error:
Error: length(f1) == length(f2) is not TRUE
In addition: There are 50 or more warnings ...
What does this mean?
So, my key questions are:
1. What is the most appropriate random structure for my repeat measures?
2. Must I include a time series correlation structure for WEEKs? Which
corStruct is most appropriate and how would I write that in for WEEK and
not YEAR in the random effect section of the formula?
3. Should this model be family=poisson? Does that error message mean I have
made an error with determining the data as poisson distributed?
4. Is it best to use ML or REML with unbalanced data if I'm going to create
a model set for model selection by adding the weather variables as fixed
effects? My hunch says ML if considering AIC/BIC values, but is REML more
important due to the unbalanced data?
I've tried reading Pinheiro and Bates with the pages available through
google books (our inadequate university libraries do not have a copy and
have their spending account frozen due to recession!) which has helped and
made this lowly student more confused simultaneously! Please can someone
point me in the right direction?
Thank you for your time.
Kate
----------------
Kate.Pressland at bristol.ac.uk
More information about the R-sig-mixed-models
mailing list