[R-sig-ME] Random effects of logistic regression: bias towards the mean?
Tom Lentz
t.o.lentz at uu.nl
Thu Mar 27 15:33:56 CET 2014
Dear Tibor,
Thanks for your answer again.
Here are replies to your remarks:
> w.r.t. the fixed values the model is obviously degenerate, since none of
> them is significant. I would suggest that you simulate without fixed
> effects. (Your intercept includes non-significant condition A.)
So, I did this (overnight) and the result is the same; the random
effects are only on average zero when the probability of the dependent
variable is 0.5 (i.e., the logit is 0). With lower probability, the
random effects are on average positive, with higher probability, they
are negative.
> I must also admit that I still do not understand the purpose of your
> simulation. I an actual experiment, you wanted to determine the
> influence of items and subjects (or something else, I e.g. test for the
> influence of individual words). With the regard to logic regression (and
> intercept random effects), the subjects or items would show that the
> logistic curve is shifted on the x-axis.
I'm sorry that my question is not clear. What I was trying to test was
whether the random effects are zero on average, as I think (1) they
should be and (2) that they might not be. This is of importance (to me)
because (in other models I made of real data) the estimates for the
fixed effects were structurally too low; the fitted values were still
right because the added random effect for item and participant was more
often positive than negative. So, if on a certain level of factors the
fixed effect estimate is e.g. 20%, the actual data is at 23%, but most
participants are slightly better than 20% (average random effect of
participant is above 0), and the same happens for items. I would rather
see the fixed effect to be similar to the mean and I was not sure if my
expectations/hopes are invalid or if the models are not estimated
correctly (and if so, how that could be solved).
> Also, in linguistics, you would
> normally try to identify which subset of the random effects play an
> actual role (by determining their individual standard deviation).
Yes, this is an extra check on model validity, but not the main goal of
my analysis. I hoped all items (in my real experiments, words to be
recognised) would be comparable and hypothesis testing of the condition
fixed effects should show whether fixed effects actually exist, taking
the random effects into account. I could look at which words are
recognised better than average, but if I take a positive random effect
to indicate this, most of the words are recognised better than average
(when I assume the fixed effects represent mean values)
I can illustrate my problem by showing one of the models, with a hit
probability of 0.2 when generating, which resulted in 20.19% of the
observations to be hits.
The model is the following:
print(summary(m), corr=F)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: outcome ~ 1 + (1 | i) + (1 | p)
Data: dataset
AIC BIC logLik deviance df.resid
1615.5 1631.7 -804.8 1609.5 1597
Scaled residuals:
Min 1Q Median 3Q Max
-0.5106 -0.5038 -0.5011 -0.4971 2.0172
Random effects:
Groups Name Variance Std.Dev.
i (Intercept) 0.000000 0.00000
p (Intercept) 0.005559 0.07456
Number of obs: 1600, groups: i, 40; p, 40
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.37628 0.06434 -21.39 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The random effect for item is now zero, as hoped, but for participant it
is not:
colMeans(ranef(m)$p)
(Intercept)
5.537502e-05
The intercept fixed effect in the model is:
inv.logit(-1.37628)
[1] 0.2016071
which is close to 0.2019, but when the random effect is added:
inv.logit(-1.37628 + colMeans(ranef(m)$p))
(Intercept)
0.201616
the estimate is closer to the mean probability.
The same would also hold in logit space:
logit(0.2019)
-1.374461
while the fixed effect is -1.37628 (too low), but with the positive
random effect for p it is closer: -1.376225
Although this effect is small, it is consistent (this is why I simulated
making such models 2000 times, to make sure it is not a fluke of one
particular model).
Thanks again for your help and your time, I look forward to your response!
Kind regards,
Tom
On 26-03-14 07:45, Tibor Kiss wrote:
> Hi Tom,
>
>
>
> From this background, I do not understand what you wanted to gain by
> your simulation.
>
> Perhaps my answer is stll of use.
>
> With kind regards
>
> Tibor
>
> qrcode.png <http://www.linguistics.rub.de/~kiss>
> *Prof. Dr. Tibor Kiss <mailto:tibor op linguistics.rub.de >*,
> Sprachwissenschaftliches Institut
> Ruhr-Universität Bochum<http://www.linguistics.rub.de>D-44780 Bochum
> Office: +49-234-322-5114 <tel:+49-234-322-5114>
>
>
>
> Am 25.03.2014 um 18:39 schrieb Tom Lentz <t.o.lentz op uu.nl
> <mailto:t.o.lentz op uu.nl>>:
>
>> Sure!
>>
>> Generalized linear mixed model fit by maximum likelihood (Laplace
>> Approximation) ['glmerMod']
>> Family: binomial ( logit )
>> Formula: outcome ~ cond + (1 | i) + (1 | p)
>> Data: dataset5
>>
>> AIC BIC logLik deviance df.resid
>> 1005.2 1037.5 -496.6 993.2 1594
>>
>> Scaled residuals:
>> Min 1Q Median 3Q Max
>> -0.3882 -0.3306 -0.3131 -0.2919 3.7239
>>
>> Random effects:
>> Groups Name Variance Std.Dev.
>> i (Intercept) 0.03539 0.1881
>> p (Intercept) 0.04775 0.2185
>> Number of obs: 1600, groups: i, 40; p, 40
>>
>> Fixed effects:
>> Estimate Std. Error z value Pr(>|z|)
>> (Intercept) -2.3476 0.1861 -12.617 <2e-16 ***
>> condB -0.1291 0.2541 -0.508 0.611
>> condC 0.1723 0.2393 0.720 0.471
>> condD 0.1172 0.2417 0.485 0.628
>> ---
>> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>> On 25-03-14 16:51, Tibor Kiss wrote:
>>> Hi,
>>>
>>> could you kindly output print(summary(m), corr = F).
>>>
>>> Best
>>>
>>> Tibor
>>>
>>>
>>> qrcode.png <http://www.linguistics.rub.de/~kiss>
>>> *Prof. Dr. Tibor Kiss <mailto:tibor op linguistics.rub.de >*,
>>> Sprachwissenschaftliches Institut
>>> Ruhr-Universität Bochum<http://www.linguistics.rub.de>D-44780 Bochum
>>> Office: +49-234-322-5114 <tel:+49-234-322-5114>
>>>
>>>
>>>
>>> Am 25.03.2014 um 16:32 schrieb Tom Lentz <t.o.lentz op uu.nl
>>> <mailto:t.o.lentz op uu.nl>
>>> <mailto:t.o.lentz op uu.nl>>:
>>>
>>>> > m
>>>> Generalized linear mixed model fit by maximum likelihood (Laplace
>>>> Approximation) ['glmerMod']
>>>> Family: binomial ( logit )
>>>> Formula: outcome ~ cond + (1 | i) + (1 | p)
>>>> Data: dataset
>>>> AIC BIC logLik deviance df.resid
>>>> 1005.1915 1037.4581 -496.5958 993.1915 1594
>>>> Random effects:
>>>> Groups Name Std.Dev.
>>>> i (Intercept) 0.1881
>>>> p (Intercept) 0.2185
>>>> Number of obs: 1600, groups: i, 40; p, 40
>>>> Fixed Effects:
>>>> (Intercept) condB condC condD
>>>> -2.3476 -0.1291 0.1723 0.1172
>>>
>
More information about the R-sig-mixed-models
mailing list