[R-sig-ME] model definition issues for repeat measures
ONKELINX, Thierry
Thierry.ONKELINX at inbo.be
Fri Apr 17 10:38:39 CEST 2009
Dear Kate,
I presume that SP are some kind of counts along transects and you
divided the counts by the length of the transect. Assuming that all
transects have equal length, I would stick with the counts. Hence you
still can work with the interger data. If the lengths are different you
could add an offset term to the model.
(1|SITE/YEAR/WEEK) indicates 1) a random effect for each site. 2) within
each site a random effect of year. Notice that this effect is
independent for each site: the same year can have a different effect in
each site! That is probably not what you want. 3) A similar structure is
used for week.
Do you have a lot of years? If you have less than six years I would not
use them as a grouping factor in a random effect.
Maybe crossed random effects is more what you want. I would suggest
(1|SITE) + (1|YEAR/WEEK). Now each year has the same random intercept
regardless the site. If you have only a few years you could try (1|SITE)
+ (YEAR|WEEK) or YEAR + (1|SITE) + (1|WEEK). Note that in this case each
week needs a unique ID. The first week from year 1 can't have the same
ID as the first week of year 2. If that is not the case you can either
recode week or change the random effects to (1|SITE) + (YEAR|YEAR:WEEK)
or YEAR + (1|SITE) + (1|YEAR:WEEK). Unless you can assume that the first
week of year 1 has the same effect as the first week of all the other
years. Which is probabily not the case.
Note that with glmer() from lme4 package you can't use a correlation
structure. I believe that is on Douglas Bates to do list but with a low
priority. lme() from the nlme package allows to model a correlation
structure, but it can't handle crossed random effects. At least not that
easy as with lme4. And lme() only handles gaussian data. So I would
stick to glmer(). With WEEK as a grouping factor in the random effect
you can't add a correlation structure based on week. Because the random
effects are assumed to be independent between the group. So you will
have to choose between adding WEEK to the random effect or adding it to
the correlation structure.
HTH,
Thierry
------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
-----Oorspronkelijk bericht-----
Van: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] Namens CL Pressland
Verzonden: donderdag 16 april 2009 19:29
Aan: r-sig-mixed-models at r-project.org
Onderwerp: [R-sig-ME] model definition issues for repeat measures
I've scrolled through the archives and CRAN help pages and can't find an
answer for my query: my apologies if it is rather basic.
I have a data set that is nested and unbalanced consisting of:
67 SITEs measured over several YEARs every WEEK (April-Sept) for a group
of
insects (SP per m - continuous data). Not all sites have all WEEKs
recorded. I'm interested in the MANagement code (categorical: coded 0,1
or
2) assigned to each site, but I have also data on TEMPerature, average
SUN
and WIND (some missing data with weather variables though). When looking
at
my SPecies data histogram it's clearly poisson distributed (common
amongst
count data), although it is in decimals. Over the WEEKs 1 to 26 there is
a
gaussian-esque distribution where species numbers peak uniformly during
July. All YEARs show this trend.
My random factors are SITE, YEAR and WEEK as they denote the structure
of
the data, but I am not interested in their effects per se, I just want
to
explain the data structure. I figure that I cannot assume a normal
model,
so have been looking into lme4 and including family=poisson. Would this
be
correct? I also think there may be a need to include some kind of
correlation function into the WEEK part of the equation (as week 2 might
be
dependent on week 1 but independent of week 20 and so on) but I am
unsure
how to do this or how necessary it really is.
My preliminary model looks as such:
model.a<-lmer(SP ~ MAN + (1|SITE/YEAR/WEEK), data=ALL,
family=poisson)
I get confused as to how to organise the random effects. My
understanding
is that to the left of the | you are looking at the slope, to the right
of
the | you are looking at the intercept. Is this correct? Should my
random
effects be (YEAR/WEEK|SITE) instead? When I run the above model I get
this
error:
Error: length(f1) == length(f2) is not TRUE
In addition: There are 50 or more warnings ...
What does this mean?
So, my key questions are:
1. What is the most appropriate random structure for my repeat measures?
2. Must I include a time series correlation structure for WEEKs? Which
corStruct is most appropriate and how would I write that in for WEEK and
not YEAR in the random effect section of the formula?
3. Should this model be family=poisson? Does that error message mean I
have
made an error with determining the data as poisson distributed?
4. Is it best to use ML or REML with unbalanced data if I'm going to
create
a model set for model selection by adding the weather variables as fixed
effects? My hunch says ML if considering AIC/BIC values, but is REML
more
important due to the unbalanced data?
I've tried reading Pinheiro and Bates with the pages available through
google books (our inadequate university libraries do not have a copy and
have their spending account frozen due to recession!) which has helped
and
made this lowly student more confused simultaneously! Please can someone
point me in the right direction?
Thank you for your time.
Kate
----------------
Kate.Pressland at bristol.ac.uk
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in this message
and any annex are purely those of the writer and may not be regarded as stating
an official position of INBO, as long as the message is not confirmed by a duly
signed document.
More information about the R-sig-mixed-models
mailing list