[R-sig-ME] Random effects of logistic regression: bias towards the mean?

Tom Lentz t.o.lentz at uu.nl
Thu Mar 27 15:33:56 CET 2014


Dear Tibor,

Thanks for your answer again.

Here are replies to your remarks:
 > w.r.t. the fixed values the model is obviously degenerate, since none of
 > them is significant. I would suggest that you simulate without fixed
 > effects. (Your intercept includes non-significant condition A.)
So, I did this (overnight) and the result is the same; the random 
effects are only on average zero when the probability of the dependent 
variable is 0.5 (i.e., the logit is 0). With lower probability, the 
random effects are on average positive, with higher probability, they 
are negative.

 > I must also admit that I still do not understand the purpose of your
 > simulation. I an actual experiment, you wanted to determine the
 > influence of items and subjects (or something else, I e.g. test for the
 > influence of individual words). With the regard to logic regression (and
 > intercept random effects), the subjects or items would show that the
 > logistic curve is shifted on the x-axis.
I'm sorry that my question is not clear. What I was trying to test was 
whether the random effects are zero on average, as I think (1) they 
should be and (2) that they might not be. This is of importance (to me) 
because (in other models I made of real data) the estimates for the 
fixed effects were structurally too low; the fitted values were still 
right because the added random effect for item and participant was more 
often positive than negative. So, if on a certain level of factors the 
fixed effect estimate is e.g. 20%, the actual data is at 23%, but most 
participants are slightly better than 20% (average random effect of 
participant is above 0), and the same happens for items. I would rather 
see the fixed effect to be similar to the mean and I was not sure if my 
expectations/hopes are invalid or if the models are not estimated 
correctly (and if so, how that could be solved).

 > Also, in linguistics, you would
 > normally try to identify which subset of the random effects play an
 > actual role (by determining their individual standard deviation).
Yes, this is an extra check on model validity, but not the main goal of 
my analysis. I hoped all items (in my real experiments, words to be 
recognised) would be comparable and hypothesis testing of the condition 
fixed effects should show whether fixed effects actually exist, taking 
the random effects into account. I could look at which words are 
recognised better than average, but if I take a positive random effect 
to indicate this, most of the words are recognised better than average 
(when I assume the fixed effects represent mean values)

I can illustrate my problem by showing one of the models, with a hit 
probability of 0.2 when generating, which resulted in 20.19% of the 
observations to be hits.

The model is the following:
print(summary(m), corr=F)
Generalized linear mixed model fit by maximum likelihood (Laplace 
Approximation) ['glmerMod']
  Family: binomial ( logit )
Formula: outcome ~ 1 + (1 | i) + (1 | p)
    Data: dataset

      AIC      BIC   logLik deviance df.resid
   1615.5   1631.7   -804.8   1609.5     1597

Scaled residuals:
     Min      1Q  Median      3Q     Max
-0.5106 -0.5038 -0.5011 -0.4971  2.0172

Random effects:
  Groups Name        Variance Std.Dev.
  i      (Intercept) 0.000000 0.00000
  p      (Intercept) 0.005559 0.07456
Number of obs: 1600, groups: i, 40; p, 40

Fixed effects:
             Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.37628    0.06434  -21.39   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


The random effect for item is now zero, as hoped, but for participant it 
is not:
colMeans(ranef(m)$p)
  (Intercept)
5.537502e-05

The intercept fixed effect in the model is:
inv.logit(-1.37628)
[1] 0.2016071
which is close to 0.2019, but when the random effect is added:
inv.logit(-1.37628 + colMeans(ranef(m)$p))
(Intercept)
    0.201616
the estimate is closer to the mean probability.

The same would also hold in logit space:
logit(0.2019)
-1.374461
while the fixed effect is -1.37628 (too low), but with the positive 
random effect for p it is closer: -1.376225

Although this effect is small, it is consistent (this is why I simulated 
making such models 2000 times, to make sure it is not a fluke of one 
particular model).

Thanks again for your help and your time, I look forward to your response!

Kind regards,

Tom


On 26-03-14 07:45, Tibor Kiss wrote:
> Hi Tom,
>

>
>
>  From this background, I do not understand what you wanted to gain by
> your simulation.
>
> Perhaps my answer is stll of use.
>
> With kind regards
>
> Tibor
>
> qrcode.png <http://www.linguistics.rub.de/~kiss> 		
> *Prof. Dr. Tibor Kiss <mailto:tibor op linguistics.rub.de >*,
> Sprachwissenschaftliches Institut
> Ruhr-Universität Bochum<http://www.linguistics.rub.de>D-44780 Bochum
> Office: +49-234-322-5114 <tel:+49-234-322-5114>
>
>
>
> Am 25.03.2014 um 18:39 schrieb Tom Lentz <t.o.lentz op uu.nl
> <mailto:t.o.lentz op uu.nl>>:
>
>> Sure!
>>
>> Generalized linear mixed model fit by maximum likelihood (Laplace
>> Approximation) ['glmerMod']
>> Family: binomial ( logit )
>> Formula: outcome ~ cond + (1 | i) + (1 | p)
>>   Data: dataset5
>>
>>     AIC      BIC   logLik deviance df.resid
>>  1005.2   1037.5   -496.6    993.2     1594
>>
>> Scaled residuals:
>>    Min      1Q  Median      3Q     Max
>> -0.3882 -0.3306 -0.3131 -0.2919  3.7239
>>
>> Random effects:
>> Groups Name        Variance Std.Dev.
>> i      (Intercept) 0.03539  0.1881
>> p      (Intercept) 0.04775  0.2185
>> Number of obs: 1600, groups: i, 40; p, 40
>>
>> Fixed effects:
>>            Estimate Std. Error z value Pr(>|z|)
>> (Intercept)  -2.3476     0.1861 -12.617   <2e-16 ***
>> condB        -0.1291     0.2541  -0.508    0.611
>> condC         0.1723     0.2393   0.720    0.471
>> condD         0.1172     0.2417   0.485    0.628
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>> On 25-03-14 16:51, Tibor Kiss wrote:
>>> Hi,
>>>
>>> could you kindly output print(summary(m), corr = F).
>>>
>>> Best
>>>
>>> Tibor
>>>
>>>
>>> qrcode.png <http://www.linguistics.rub.de/~kiss>
>>> *Prof. Dr. Tibor Kiss <mailto:tibor op linguistics.rub.de >*,
>>> Sprachwissenschaftliches Institut
>>> Ruhr-Universität Bochum<http://www.linguistics.rub.de>D-44780 Bochum
>>> Office: +49-234-322-5114 <tel:+49-234-322-5114>
>>>
>>>
>>>
>>> Am 25.03.2014 um 16:32 schrieb Tom Lentz <t.o.lentz op uu.nl
>>> <mailto:t.o.lentz op uu.nl>
>>> <mailto:t.o.lentz op uu.nl>>:
>>>
>>>> > m
>>>> Generalized linear mixed model fit by maximum likelihood (Laplace
>>>> Approximation) ['glmerMod']
>>>> Family: binomial ( logit )
>>>> Formula: outcome ~ cond + (1 | i) + (1 | p)
>>>>  Data: dataset
>>>>     AIC       BIC    logLik  deviance  df.resid
>>>> 1005.1915 1037.4581 -496.5958  993.1915      1594
>>>> Random effects:
>>>> Groups Name        Std.Dev.
>>>> i      (Intercept) 0.1881
>>>> p      (Intercept) 0.2185
>>>> Number of obs: 1600, groups: i, 40; p, 40
>>>> Fixed Effects:
>>>> (Intercept)        condB        condC        condD
>>>>   -2.3476      -0.1291       0.1723       0.1172
>>>
>



More information about the R-sig-mixed-models mailing list