[R-sig-ME] Nonsensical results in glmer (bglmer)

Sun Jul 30 18:52:36 CEST 2023

Dear Francesco,

Don't use a factor response variable. Use either FALSE/TRUE or 0/1. Note
that as.factor() would use "correct" as the first level and "incorrect" as
the second level. Maybe the model uses the first levels as FALSE and
the second as TRUE. Setting TRUE and FALSE yourselves eliminates such
ambiguity.
I prefer to do any transformations like as.factor() or relevel() prior to
fitting the model. Then you have the same variables available in the
dataset (e.g. for plotting).

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx using inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>

Op zo 30 jul 2023 om 17:46 schreef Francesco Romano <fbromano77 using gmail.com>:

> Dear all,
>
>
> I wonder if anyone can account for a counterintuitive result in my analyses
> of a logistic regression run with glmer. The data, where I only use the
> columns RESP as a binomial outcome coded as character (correct vs.
> incorrect), Type as a factor with 2 levels where 'islands' constitutes the
> reference level, Group as a factor with two levels where 'L1' constitutes
> the reference level, and SUBJ and ITEM as random effects, is attached. In
> this particular analysis I use bglmer but the same result ensues from glmer
> or even a bernoulli brm analysis in the Bayesian framework (i.e. brms
> package).
>
> The following output shows that the group L1, the reference group, has a
> lower probability of a correct score compared to L2. If the output is
> garbled in your email, feel free to run the code yourself to get a clearer
> picture. Under the fixed-effects of the output, we interpret the
> coefficient -1.38 of r*elevel(masterPT$Group, ref = "L2")L1        -1.3802
>     0.6907  -1.998  0.04567 * *to mean that the probability of scoring a
> correct answer is approximately 34% lower in the L1 than the L2 group
> (following Gelman and Hill, 2007, p.93) we divide the coeffecient -1.38
> expressed in log odds by 4 to obtain an approximate corresponding
> probability). This is counterintuitive because the L1 group is basically a
> group of native speakers of Spanish, the language being tested, while the
> L2 is a bilingual group being tested in Spanish as a foreign language which
> they learned later in life. Even playing devil's advocate, a quick look at
> a prop table or even the figures also attached shows the L1 group exhibit
> lower counts of incorrect responses. In the figures, this can be easily
> seen by comparing the amount of light blue splash for either of the Type
> levels between the L1 and L2 groups. There is far less of a splash in the
> L1 data which suggests they should statisticall have a higher chance of
> selecting a correct response.
>
> > freqmodel2<-bglmer(as.factor(RESP)~Type*relevel(masterPT$Group, ref =
> "L2")+(1+Type|SUBJ)+(1+Group|ITEM), family = binomial(link="logit"),
> data=masterPT, control=glmerControl(optimizer = "bobyqa"), nAGQ=1)
> > summary(freqmodel2)
> Cov prior  : SUBJ ~ wishart(df = 4.5, scale = Inf, posterior.scale = cov,
> common.scale = TRUE)
>            : ITEM ~ wishart(df = 4.5, scale = Inf, posterior.scale = cov,
> common.scale = TRUE)
> Prior dev  : -5.3929
>
> Generalized linear mixed model fit by maximum likelihood (Laplace
> Approximation) ['bglmerMod']
>  Family: binomial  ( logit )
> Formula: as.factor(RESP) ~ Type * relevel(masterPT$Group, ref = "L2") +
>  (1 + Type | SUBJ) + (1 + Group | ITEM)
>    Data: masterPT
> Control: glmerControl(optimizer = "bobyqa")
>
>      AIC      BIC   logLik deviance df.resid
>   1676.8   1734.1   -828.4   1656.8     2265
>
> Scaled residuals:
>     Min      1Q  Median      3Q     Max
> -4.1192 -0.3447 -0.1315  0.1106  4.8595
>
> Random effects:
>  Groups Name        Variance Std.Dev. Corr
>  SUBJ   (Intercept) 1.915    1.384
>         Type2       1.642    1.281    -0.74
>  ITEM   (Intercept) 8.970    2.995
>         Group2      6.479    2.545    -0.74
> Number of obs: 2275, groups:  SUBJ, 65; ITEM, 35
>
> Fixed effects:
>                                             Estimate Std. Error z value
> Pr(>|z|)
> (Intercept)                                  -1.3577     0.4947  -2.745
>  0.00606 **
> Type2                                         1.4683     0.8340   1.760
>  0.07834 .
> relevel(masterPT$Group, ref = "L2")L1        -1.3802     0.6907  -1.998
>  0.04567 *
> Type2:relevel(masterPT$Group, ref = "L2")L1  -4.0366     1.2383  -3.260
>  0.00112 **
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Correlation of Fixed Effects:
>             (Intr) Type2  r(Pr="
> Type2       -0.546
> r(PT$G,r="L -0.325  0.157
> T2:(PT$Gr="  0.151 -0.221 -0.442
>
>
> What am I missing here?
> Am I interpreting something wrong?
>
> Many thanks in advance for any help,
>
> Best,
>
> Francesco Romano PhD
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]