[R-sig-ME] lme4:glmer with nested and longitudinal data?
Ben Bolker
bbolker at gmail.com
Tue Nov 15 04:51:02 CET 2011
<hijita at ...> writes:
>
> Dear all,
>
> I am a beginner in R, and got stuck with this problem:
> I have the following dataset with results from an
> experiment with individual bats that performed two tasks
> related to prey capture under different conditions:
>
> X variables:
> indiv - 5 individual bats used in the experiment;
> all of which performed both tasks
> task - 2 tasks that each individual bat had to perform
> dist - 5 repeated measures of individual bats at
> 5 different distances from the object
>
> all x variables I treat as categorical factors with levels
>
> Y - I have 8 dependent variables related to the structure of
> ultrasound calls emitted by bats when
> performing each task. I know I can use them together in the
> same model with the function “cbind()” but
> each variable behaves a bit differently. Thus, I guess it
> would be better to build 8 separate models.
It would certainly be easier, but you might also benefit from
the increased power of treating the responses as multivariate ...
but mixed MANOVAs are not trivial to set up (I think).
>
> I believe "indiv" should be a random effect in the model;
> "dist" and "task" should be fixed effects.
>
As pointed out on the other list, indiv is *philosophically*
a random effect, but practically it's not likely to work very
well to try to fit a random-effects variance on only five levels.
You can read more about this at http://glmm.wikidot.com/faq .
> I´d like to use the “glmer” (lme4) function to test two hypotheses:
Why glmer and not lmer? Are your responses discrete counts?
(You may be confused between "GLM" standing for "general linear model"
and for "generalized linear model" ...)
>
> Main hypothesis:
> There are differences in Y measurements between tasks,
> which are related also to distance from the object.
>
> Secondary hypothesis:
> Differences in Y measurements between tasks do not depend on the individual.
>
> I guess the simplest model for an AIC model selection would be:
> print(Model.01<-glmer(y~task*dist+(1|indiv))
I think you really want lmer() [although in this case you would
get the same answer, as glmer() defaults to a Gaussian family which
gives the same results as lmer()]
This is in principle the right model, but if your data are at
all noisy you are likely to find that the among-individual variance
is estimated as zero.
> - so any model that provides more details should have a lower
> (better) AIC score.
??? what ??? only if the 'more details' actually find meaningful
structure in the data. Otherwise they will increase it.
> I’m not sure if I’m coding the models including more details
> correctly. I just want them to test my
> hypothesis properly .
>
> Literature suggests to get rid of the pseudo-replication:
> my repeated measures (“dist”) seem to
> behave like longitudinal data (as it is basically a time series).
If so, might you want to treat distance as continuous rather
than categorical?
> This way, "indiv" would be nested in
> "dist". Furthermore, "dist" levels have different variance,
> so it would be good to group the data and
> somehow tell the model, that it should ignore differences in variance.
> (1|dist:indiv) or (dist|indiv)?
> I am still wondering if the “weights=” argument would apply here?
You can't 'ignore' differences in variance, but you might (if you
use lme from the nlme package instead of lmer from the lme4 package)
be able to model those differences, using the weights argument as you
suggest -- something like weights=varIdent(~dist)
>
> 2- “dist” is nested in “task”, as for both tasks
> I have the same distances measured.
> According to the description of the package “lme4”, I should write:
> print(Model.02<-glmer(y~task*dist+(1|indiv)+(1|task:dist))
If task and dist are categorical (factors), then you can't include
the task-by-distance interaction as both a fixed effect (implicit in
'task*dist' and as a random effect (1|task:dist) ...
>
> 3- according to the text -book “The R book” I understand
> that I should do the following:
> rename factor levels to unique labels:
> taskdist<-task:dist
> taskdistindiv<-task:dist:indiv
> print(Model.03<-glmer(y~task+(1|taskdist)+(1|taskdistindiv)))
>
> Or, pooling 1 and 3 together myself I would end up with:
> print(Model.04<-glmer(y~dist*task+(1|indiv)+(1|task:dist)+(1|dist:indiv))
>
I think you're tackling a moderately complicated analysis of a small
data set here, and you're going to have to think hard about which
aspects of the data are most important to you. You'll then need to
bite the bullet and decide _a priori_ to ignore the aspects that are
less important, leaving them out of the model, because otherwise you
are going to overfit your model and end up with mush.
I think you need to try to get some local expert help -- it's going
to be very hard getting enough advice from anonymous helpers via e-mail
lists. The R Book is a good introduction, but if you really need to
do this on your own I would recommend setting aside some time and sitting
down with a copy of Pinheiro and Bates 2000 for a while ...
I would even say, sacrilegiously, that if the only local statistics
experts use SAS (or Stata, or SPSS, or ...) that you should use the
same package they do. As I said, this is a moderately tricky analysis,
you're going to have to make some compromises, and the old "better an
approximate answer to the right question than the exact answer to
the wrong question" maxim applies ...
sincerely
Ben Bolker
More information about the R-sig-mixed-models
mailing list