[R-sig-ME] model definition issues for repeat measures

Fri Apr 17 10:38:39 CEST 2009

Dear Kate,

I presume that SP are some kind of counts along transects and you
divided the counts by the length of the transect. Assuming that all
transects have equal length, I would stick with the counts. Hence you
still can work with the interger data. If the lengths are different you
could add an offset term to the model.

(1|SITE/YEAR/WEEK) indicates 1) a random effect for each site. 2) within
each site a random effect of year. Notice that this effect is
independent for each site: the same year can have a different effect in
each site! That is probably not what you want. 3) A similar structure is
used for week.
Do you have a lot of years? If you have less than six years I would not
use them as a grouping factor in a random effect.

Maybe crossed random effects is more what you want. I would suggest
(1|SITE) + (1|YEAR/WEEK). Now each year has the same random intercept
regardless the site. If you have only a few years you could try (1|SITE)
+ (YEAR|WEEK) or YEAR + (1|SITE) + (1|WEEK). Note that in this case each
week needs a unique ID. The first week from year 1 can't have the same
ID as the first week of year 2. If that is not the case you can either
recode week or change the random effects to (1|SITE) + (YEAR|YEAR:WEEK)
or YEAR + (1|SITE) + (1|YEAR:WEEK). Unless you can assume that the first
week of year 1 has the same effect as the first week of all the other
years. Which is probabily not the case.

Note that with glmer() from lme4 package you can't use a correlation
structure. I believe that is on Douglas Bates to do list but with a low
priority. lme() from the nlme package allows to model a correlation
structure, but it can't handle crossed random effects. At least not that
easy as with lme4. And lme() only handles gaussian data. So I would
stick to glmer(). With WEEK as a grouping factor in the random effect
you can't add a correlation structure based on week. Because the random
effects are assumed to be independent between the group. So you will
have to choose between adding WEEK to the random effect or adding it to
the correlation structure.

HTH,

Thierry

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be 
www.inbo.be 

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] Namens CL Pressland
Verzonden: donderdag 16 april 2009 19:29
Aan: r-sig-mixed-models at r-project.org
Onderwerp: [R-sig-ME] model definition issues for repeat measures

I've scrolled through the archives and CRAN help pages and can't find an

answer for my query: my apologies if it is rather basic.

I have a data set that is nested and unbalanced consisting of:

67 SITEs measured over several YEARs every WEEK (April-Sept) for a group
of 
insects (SP per m - continuous data). Not all sites have all WEEKs 
recorded. I'm interested in the MANagement code (categorical: coded 0,1
or 
2) assigned to each site, but I have also data on TEMPerature, average
SUN 
and WIND (some missing data with weather variables though). When looking
at 
my SPecies data histogram it's clearly poisson distributed (common
amongst 
count data), although it is in decimals. Over the WEEKs 1 to 26 there is
a 
gaussian-esque distribution where species numbers peak uniformly during 
July. All YEARs show this trend.

My random factors are SITE, YEAR and WEEK as they denote the structure
of 
the data, but I am not interested in their effects per se, I just want
to 
explain the data structure. I figure that I cannot assume a normal
model, 
so have been looking into lme4 and including family=poisson. Would this
be 
correct? I also think there may be a need to include some kind of 
correlation function into the WEEK part of the equation (as week 2 might
be 
dependent on week 1 but independent of week 20 and so on) but I am
unsure 
how to do this or how necessary it really is.

My preliminary model looks as such:

	model.a<-lmer(SP ~ MAN + (1|SITE/YEAR/WEEK), data=ALL,
family=poisson)

I get confused as to how to organise the random effects. My
understanding 
is that to the left of the | you are looking at the slope, to the right
of 
the | you are looking at the intercept. Is this correct? Should my
random 
effects be (YEAR/WEEK|SITE) instead? When I run the above model I get
this 
error:

	Error: length(f1) == length(f2) is not TRUE
	In addition: There are 50 or more warnings ...

What does this mean?

So, my key questions are:

1. What is the most appropriate random structure for my repeat measures?
2. Must I include a time series correlation structure for WEEKs? Which 
corStruct is most appropriate and how would I write that in for WEEK and

not YEAR in the random effect section of the formula?
3. Should this model be family=poisson? Does that error message mean I
have 
made an error with determining the data as poisson distributed?
4. Is it best to use ML or REML with unbalanced data if I'm going to
create 
a model set for model selection by adding the weather variables as fixed

effects? My hunch says ML if considering AIC/BIC values, but is REML
more 
important due to the unbalanced data?

I've tried reading Pinheiro and Bates with the pages available through 
google books (our inadequate university libraries do not have a copy and

have their spending account frozen due to recession!) which has helped
and 
made this lowly student more confused simultaneously! Please can someone

point me in the right direction?

Thank you for your time.

Kate
----------------
Kate.Pressland at bristol.ac.uk

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.