[R-sig-ME] [R] Differences between glmmPQL and lmer and AIC calculation

Wed Jul 17 17:07:35 CEST 2013

Dear Ben,

thank you very much for your quick reply and for your good advice!

The reason why we won't use mean values is that we will lose a lot of
variance of each individual.
In general behavioural variables (e.g. activity) have been recorded
continiously over a certain time (e.g. a month). These collected data
are already mean values of an hour or a day, listed consecutively one
below the other in one column for each behavioural variable,
respectively. So, all in all, we have several repeated measurements
for each individual (different values, probably even with a time
effect within an individual) whereas the response variable remain the
same all the time.
What we want to find out is, whether the response variable (which is a
fixed individual characteristic trait) can be explained by different
behavioural traits which we have measured continiously; or in other
words: the probability for an individual of being coded 0 or 1
depending on their behaviour.

Hopefully it gets clearer now, so that you or anybody else could give
me further advices about chosing the right model for data analysis.

Anyway, as you've recommended, next step is that I will calculate mean
values of the explanatory variables for each invidivual and run a
simple logistic regression. Let's see what comes out there then.

Many thanks for your help so far, Ben!

Best wishes,
Tonio

---------- Forwarded message ----------
From: Ben Bolker <bbolker at gmail.com>
Date: 2013/7/17
Subject: Re: [R] Differences between glmmPQL and lmer and AIC calculation
To: Tonio Pieterek <t.pieterek at googlemail.com>

   It would be best to continue this on r-sig-mixed-models.  However, as
I think I may have said before, if your response variable doesn't vary
within individuals, then it doesn't make any sense (that I can see) to
use the within-individual variations to try to fit the model.  It would
be best to simply aggregate the within-individual variation in the
predictor variables and get a single set of predictors for each
individual -- typically I would use the mean, but you could use other
summary statistics (min, max, range, standard deviation ...) to the
extent that they made biological sense.

  good luck
    Ben Bolker

On 13-07-17 08:42 AM, Tonio Pieterek wrote:
> Dear Ben,
>
> sorry for this late response but I just wanted to tryo out your
> suggestions at first. And before I forget, hank you for your answer!
>
> Unfortunately, I wasn't able to run the glmmadmb so far due to several
> error messages, e.g. before removing the NA's of the dataset:
>
> "Error in II[, ii] = II[, ii] + REmat$codes[[i]] :
>    number of items to replace is not a multiple of replacement length
> In addition: Warning messages:
> 1: In glmmadmb(sexActs ~ (1 | id), sexpartner, debug = TRUE) :
>    NAs removed in constructing fixed-effect model frame: you should
> probably remove them manually, e.g.
> with na.omit()
> 2: In II[, ii] + REmat$codes[[i]] :
>    longer object length is not a multiple of shorter object length"
>
> or after removing the NA's of the dataset:
>
>
> "Convergence failed:log-likelihood of gradient= xx.xx"
>
> and I couldn't fix what the problem is even after reading some other
> postings with similar problems/error messages of the glmmadmb.
>
> However, just to give you further information about my data: Yes, the
> fish are grouped individually either belonging to category 0 or to
> category 1, therefore the observation of the response variable differs
> only between individuals but not within an individual.
>
> You've mentioned the development version of lme4. What is the
> difference in using this version compared to the "normal" one which I
> have tried out already?
>
> And do you recommend to post my whole problem to the
> r-sig-mixed-models again? Sorry for asking this but I haven't much
> experience with R and with the help-mailing list either.
>
>
> I would really appreciate it if you could give me further advices how
> to solve my problem.
> Thank you again for your response!
>
>
> Best wishes,
> Tonio
>
> 2013/7/12 Ben Bolker <bbolker at gmail.com>:
>> Tonio Pieterek <t.pieterek <at> googlemail.com> writes:
>>
>>>
>>
>> [snip]
>>
>> This is really more appropriate for r-sig-mixed-models at r-project.org :
>> please refer any followup questions there.
>>
>>> For my Master thesis I collected some behavioral data of fish using
>>> acoustic telemetry. The aim of the study is to compare two different
>>> groups of fish (coded as 0 and 1 which should be the dependent
>>> variable) based on their swimming activity, habitat choice, etc.
>>> (independent variables).
>>
>>   I don't quite understand this part.  You're trying to figure out
>> whether a particular observation falls into one category or
>> another (0/1)?  Do individual fish (id in the formula below)
>> always fall into one category or the other?
>>
>>> Each fish has several observations over time
>>> (repeated measurements) which I included as random factor in my models
>>> using library glmmPQL (package MASS). Because I have a binary data
>>> structure, I am using generalized linear mixed models.
>>
>>   For what it's worth, penalized quasi-likelihood (used by glmmPQL)
>> is generally considered to be a bit dicey with binary responses
>> (see e.g. Bolker et al 2008 TREE paper).
>>
>>> Using library glmmPQL the results reflect my descriptive analyses and
>>> the results are sound. However, we also want to rank several candidate
>>> models using AIC. And this is where the problems start. Because
>>> glmmPQL does not provide AIC values or comparable measures, I also
>>> tried to calculate the same models using function lmer. Against
>>> expectations, I got completely different results from these two
>>> libraries (glmmPQL = highly significant; lmer = far away from being
>>> significant with p = 0.9xx).
>>>
>>> I used the following codes:
>>>
>>> cal1=glmmPQL(y ~ activity, random=~1|id, data=data, family=binomial,
>>> na.action=na.omit)
>>>
>>>> WORKS FINE
>>>
>>> cal1 = lmer(y ∼ activity + (1 | id ), family = binomial, data=data,
>>> na.action=na.omit)
>>>
>>>> PRODUCED misleading and totally different results compared to glmmPQL
>>>  (e.g. sometimes error message
>>> occurs: In mer_finalize(ans) : false convergence (8); even
>>> for very simple models)
>>
>>   Do you possibly have complete separation, i.e. some individuals
>> with all-zero responses?
>>
>>   Have you tried the development version of lme4?
>>
>>>
>>> A glmmML did not work since we got the following failure message, for
>>> which we were not able to find out the reason and therefore could not
>>> go on with this model:
>>>
>>> “[glmmml] fail = 1
>>
>>  [snip]
>>
>>> The questions are:
>>>
>>> 1) Why did glmmPQL and lmer produce completely different results and
>>> how can I solve this problem? Following Zuur et al. 2009* the models
>>> should provide very similar results, but they didn`t.
>>
>>   I strongly suspect that there's something wrong with your setup.
>> In particular, if the response variable cal1 (0/1) only varies at
>> the level of individuals (id), and not within id, then you should
>> probably just calculate the mean activity per individual id and
>> run a simple logistic regression.  My guess would be that glmmPQL
>> may have papered over some cracks for you ...
>>
>>> 2) Can I calculate AIC values (or something comparable)
>>> using library glmmPQL?
>>
>>   No, not without a great deal of difficulty.
>>>
>>>  3) Is there any other option (library) to analyze my data including an AIC?
>>
>>   You can use JAGS/WinBUGS with data cloning (the dclone package), or
>> glmmADMB.
>>
>>>
>>> If something remained unclear or if you have any question about
>>> details, please let me know.
>>>
>>> I would really appreciate any kind of help referring to my problem(s).
>>>
>>>
>>> *Alain F. Zuur,  Elena N. Ieno,  Neil J. Walker, Anatoly A. Saveliev,
>>> Graham M. Smith. (2009). Mixed Effects Models and Extensions in
>>> Ecology with R. Springer Science+Business Media, New York, USA.
>>>
>>> ISSN 1431-8776
>>> ISBN 978-0-387-87457-9
>>> DOI 10.1007/978-0-387-87458-6
>>
>>
>>    I suspect
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.