[R-sig-ME] generalized mixed linear models, glmmPQL and GLMER give very different results that both do not fit the data well...

Franssens, Samuel Samuel.Franssens at econ.kuleuven.be
Tue Mar 22 09:24:51 CET 2011


Thanks a lot for taking the time to help. At least now I know I am not doing something silly. I re-analyzed data from an earlier experiment, where we did a wrong analysis (treating proportions of correct responses per subject as continuous variable, and then doing a repeated measures anova, without correction; http://ppw.kuleuven.be/reason/wim/data/COGNIT_2010_correctedproofs.pdf , exp1). In this experiment, we used the same design, but there was no power manipulation (so you have the same 8 reasoning problems per person, two types of problem, conflict & no conflict, but no between subjects variable). Here, glmer gives very nice results. 

I look forward to reading a chapter in your book on GLMMs, as this is a very common design in (social) psychology (and also very often the wrong analysis is done, cf. above :-) )

Sam.

-----Original Message-----
From: dmbates at gmail.com [mailto:dmbates at gmail.com] On Behalf Of Douglas Bates
Sent: Monday 21 March 2011 6:36 PM
To: Franssens, Samuel
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] generalized mixed linear models, glmmPQL and GLMER give very different results that both do not fit the data well...

On Sat, Mar 19, 2011 at 6:17 AM, Franssens, Samuel <Samuel.Franssens at econ.kuleuven.be> wrote:
> Thanks for the insightful reply! I've studied linear mixed models at undergraduate level, but I was unfamiliar with GLMMs. It was kind of hard at first to find out how to analyze these data, but I'm glad I found this group. I'm also trying to read the book on ecology by Zuur (2009).

> My data come from an experiment in which people have to solve two types of reasoning problems, 4 of each kind. The noconflict problems are filler problems which are very easy, the conflict problems are harder, some people will get all 4 wrong, some will get all 4 right. I could leave the no conflict problems out of the analysis, but I want to conclude that the power manipulation had no effect on performance on noconflict problems, but it did on the conflict problems (to rule out alternative explanations such as decreased motivation/attention). I think an analysis on the full data set is preferred over 2 separate analyses per problem type.

Unfortunately the results don't indicate an interaction between the power manipulation and the conflict/noconflict nature of the question.

I would agree that in general we prefer to do an "omnibus" analysis incorporating all the observed data.  Unfortunately with a binary response there can be circumstances where you can't really get that much out of the data, precisely because it is so homogeneous.
Paradoxically the difficult cases to fit with generalized linear models are those where the "signal" is very strong - the so-called complete separation cases.  In this case having almost all of the noconflict questions answered correctly means that you get very little information from those responses.

> I made the histograms and this confirms your expectations: for the no conflict problems, in all  three groups nobody scores less than 3/4, which is also evident from the means in these cells. For the conflict problems, the data have a similar distribution in all three groups, most mass is on 4/4, with 0/4, 1/4, 2/4, 3/4 getting almost equal mass. So indeed, variability is large.

> I tried to fit some models with effect of type also varying over subjects, because I thought this would make sense intuitively (some people will be very good at noconflict, but very bad at conflict; others will be good at both), but then I consistently get correlations between random effects of -1, which I thought was an indication of convergence problems, so I left this term out.

> I have read about the different estimation approaches used in GLMM's and I've noticed experts recommend not using PQL. So I will now focus on Laplace and GHQ (I knew this was possible in glmer, but did not know how much nAGQ should be, therefore I also used glmmML). nAGQ 7 and 9 gives almost completely the same results, but they differ a bit from laplace:

> Generalized linear mixed model fit by the adaptive Gaussian Hermite 
> approximation
> Formula: accuracy ~ f_power * f_type + (1 | f_subject)
>   Data: syllogisms
>   AIC   BIC logLik deviance
>  441.5 473.2 -213.7    427.5
> Random effects:
>  Groups    Name        Variance Std.Dev.
>  f_subject (Intercept) 4.7286   2.1745 Number of obs: 688, groups: 
> f_subject, 86
>
> Fixed effects:
>                             Estimate Std. Error z value Pr(>|z|)
> (Intercept)                  1.470526   0.493530   2.980  0.00289 ** 
> f_powerhp                    0.124439   0.690869   0.180  0.85706 
> f_powerlow                   2.020807   0.833203   2.425  0.01529 * 
> f_typenoconflict             3.267835   0.643001   5.082 3.73e-07 *** 
> f_powerhp:f_typenoconflict   0.219783   0.927490   0.237  0.81268 
> f_powerlow:f_typenoconflict -0.005484   1.455748  -0.004  0.99699

> and

> Generalized linear mixed model fit by the Laplace approximation
> Formula: accuracy ~ f_power * f_type + (1 | f_subject)
>   Data: syllogisms
>  AIC   BIC logLik deviance
>  406 437.7   -196      392
> Random effects:
>  Groups    Name        Variance Std.Dev.
>  f_subject (Intercept) 4.9968   2.2353 Number of obs: 688, groups: 
> f_subject, 86

> Fixed effects:
>                            Estimate Std. Error z value Pr(>|z|)
> (Intercept)                  1.50745    0.50507   2.985  0.00284 ** 
> f_powerhp                    0.13083    0.70719   0.185  0.85323 
> f_powerlow                   2.04121    0.85308   2.393  0.01672 * 
> f_typenoconflict             3.28715    0.64673   5.083 3.72e-07 *** 
> f_powerhp:f_typenoconflict   0.21680    0.93165   0.233  0.81599 
> f_powerlow:f_typenoconflict -0.01199    1.45807  -0.008  0.99344

In these cases the fits using adaptive Gauss-Hermite quadrature are considered more reliable.  It's actually a good thing that the results from aGHQ and the Laplace approximation are similar.  They should be.

> I don't really know what criterion I should use to choose between models. Are any of the two actually useful, or are my data not fit for this kind of analysis? Is there another kind of analysis I could try?

I would be concerned about the very high estimated standard error of the random effects for subject.  You may be able to draw conclusions about the power treatment and the type of question but I wouldn't be terribly confident of the results.  Others reading the list may have better suggestions than I can provide.

> Thanks a lot for your time. I also think I understand the comment on 
> the random effects :)
>
> -----Original Message-----
> From: dmbates at gmail.com [mailto:dmbates at gmail.com] On Behalf Of 
> Douglas Bates
> Sent: Saturday 19 March 2011 10:02 AM
> To: David Duffy
> Cc: Franssens, Samuel; r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] generalized mixed linear models, glmmPQL and GLMER give very different results that both do not fit the data well...
>
> On Fri, Mar 18, 2011 at 9:54 PM, David Duffy <davidD at qimr.edu.au> wrote:
>> On Fri, 18 Mar 2011, Franssens, Samuel wrote:
>>
>>> Hi,
>>>
>>> I have the following type of data: 86 subjects in three independent 
>>> groups (high power vs low power vs control). Each subject solves 8 
>>> reasoning problems of two kinds: conflict problems and noconflict 
>>> problems. I measure accuracy in solving the reasoning problems. To
>>> summarize: binary response, 1 within subject var (TYPE), 1 between subject var (POWER).
>>>
>>> I wanted to fit the following model: for problem i, person j:
>>> logodds ( Y_ij ) = b_0j + b_1j TYPE_ij with b_0j = b_00 + b_01 
>>> POWER_j + u_0j and b_1j = b_10 + b_11 POWER_j
>>>
>>> I think it makes sense, but I'm not sure.
>>> Here are the observed cell means:
>>>            conflict       noconflict control     0.6896552      
>>> 0.9568966 high        0.6935484      0.9677419 low         0.8846154      
>>> 0.9903846
>>>
>>> GLMER gives me:
>>> Formula: accuracy ~ f_power * f_type + (1 | subject)
>>>  Data: syllogisms
>>> Random effects:
>>> Groups  Name        Variance Std.Dev.
>>> subject (Intercept) 4.9968   2.2353
>>> Number of obs: 688, groups: subject, 86
>>>
>>> Fixed effects:
>>>                           Estimate Std. Error z value Pr(>|z|)
>>> (Intercept)                  1.50745    0.50507   2.985  0.00284 ** 
>>> f_powerhp                    0.13083    0.70719   0.185  0.85323 
>>> f_powerlow                   2.04121    0.85308   2.393  0.01672 * 
>>> f_typenoconflict             3.28715    0.64673   5.083 3.72e-07 *** 
>>> f_powerhp:f_typenoconflict   0.21680    0.93165   0.233  0.81599 
>>> f_powerlow:f_typenoconflict -0.01199    1.45807  -0.008  0.99344
>>> ---
>>>
>>> Strange thing is that when you convert the estimates to 
>>> probabilities, they are quite far off. For control, no conflict 
>>> (intercept), the estimation from glmer is 1.5 -> 81% and for glmmPQL 
>>> is 1.14 -> 75%, whereas the observed is: 68%.
>>>
>>> Am I doing something wrong?
>>
>> You are forgetting that your model includes a random intercept for 
>> each subject.
>
> To follow up on David's comment a bit, you have a very large estimated standard deviation for the random effects for subject.  An estimated standard deviation of 2.23 in the glmer fit is huge.  On the scale of the linear predictor the random effects would swamp the contribution of the fixed effects - different subjects' random effects could easily differ by 8 or more (+/ 2 s.d.).  Transformed to the probability scale that takes you from virtually impossible to virtually certain.  To get that large an estimated standard deviation you would need different subjects exposed to the same experimental conditions and who get completely different results (all wrong vs all right).  The estimate of the fixed effect for type is also very large, considering that this is a binary covariate.
>
> I would suggest plotting the data, say by first grouping the subjects into groups according to power then by subject then by type.  You should see a lot of variability in  responses according to subject within power.
>
> Regarding the difference between the glmer and the glmmPQL results, bear in mind that glmmPQL is based on successive approximation of the GLMM by an LMM.  This case is rather extreme and the approximation is likely to break down.  In fact, for the glmer fit, I would recommend using nAGQ=7 or nAGQ=9 to see if that changes the estimates substantially.  The difference between the default evaluation of the deviance by the Laplace approximation and the more accurate adaptive Gauss-Hermite quadrature may be important in this case.
>
> I just saw your other message about the properties of the random 
> effects.  The values returned are the conditional modes of the random 
> effects given the observed data.  The mean 0 and standard deviation of
> 2.3 apply to the marginal or unconditional distribution of the random effects, not the conditional distribution.
>




More information about the R-sig-mixed-models mailing list