[R-sig-ME] Modeling truncated counts with glmer

Thu Feb 2 14:10:41 CET 2017

Dear Thierry,

Thank you, that makes sense now! I have been reading more on this and  
playing with the data to understand it better. Here are some final  
questions:

I've reduced the model to only include the abruf term to simplify things:

              Estimate Std. Error z value Pr(>|z|)
(Intercept)   -0.0865     0.1909 -0.4532   0.6504
I(abruf - 1)   1.3241     0.0505 26.2030   0.0000

the probability of answering correctly is given by the equation  
-0.0865 + 1.3241(abruf).

* When abruf is zero, the result is the probability of an average  
person, on trial 0, answering correctly.

So in this case that means odds of exp(-0.0865)=0.92 and a probability  
of plogis(-0.0865)=0.48 (which means on average 0.48*40= 19 correct  
pairs)

* for each subsequent trial the odds of answering correctly increase  
by exp(1.3241)=3.76 (almost 4x more likely to answer correctly  
relative to trial 0) or plogis(1.3241)= 79% increase.

This means our average joe, for example on the last trial (trial=2),  
will be -0.0865 + 1.3241*2=2.5617 or exp(2.5617)= 13x more likely to  
answer correctly than on trial zero. Which translates to  
plogis(2.5617) = 92% probability of success or 37 correct pairs.

* if I want to predict how well a person will do in this test after a  
first trial, I simply need to change the intercept? Let's imagine  
someone is not very good at this and only gets 25% of the pairs  
correctly on the first go. Her or his intercept is log(0.25/0.75) and  
on trial 2 the predicted correct number of word-pairs is:

log(0.25/0.75) + 1.3241*2 = 82% or 32 correct pairs.

Again thank you so much for your replies. If you ever come to this  
neck of the woods a free beer is in order!

Best,
J Santiago

Quoting Thierry Onkelinx <thierry.onkelinx at inbo.be>:

> Dear João,
>
> The intercept is -0.07376 on the **logit** scale. That is 0.48 on the
> original scale. Use plogis(-0.07376) to transform from logit to original
> scale. Your interpretation of the intercept is correct.
>
> Best regards,
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> 2017-02-01 14:22 GMT+01:00 João C P Santiago <joao.santiago at uni-tuebingen.de
>> :
>
>> Thank you for your input! Only now did I go back to this model.
>>
>> I'm having some doubts about the meaning of the intercept from my binomial
>> model. Here's the complete output:
>>
>> Generalized linear mixed model fit by maximum likelihood (Laplace
>> Approximation) ['glmerMod']
>>  Family: binomial  ( logit )
>> Formula: cbind(correctPair, incorrectPair) ~ I(abruf - 1) * treatment +
>>     version + (1 | subjectNumber)
>>    Data: .
>>
>>      AIC      BIC   logLik deviance df.resid
>>    691.4    708.4   -339.7    679.4      119
>>
>> Scaled residuals:
>>     Min      1Q  Median      3Q     Max
>> -3.2676 -0.7861 -0.0428  0.9417  2.7483
>>
>> Random effects:
>>  Groups        Name        Variance Std.Dev.
>>  subjectNumber (Intercept) 0.7135   0.8447
>> Number of obs: 125, groups:  subjectNumber, 21
>>
>> Fixed effects:
>>                                   Estimate Std. Error z value Pr(>|z|)
>> (Intercept)                       -0.07376    0.20096  -0.367    0.714
>> I(abruf - 1)                       1.30891    0.06904  18.958   <2e-16 ***
>> treatmentStimulation               0.06116    0.09961   0.614    0.539
>> versionB                          -0.08709    0.07222  -1.206    0.228
>> I(abruf - 1):treatmentStimulation  0.03342    0.09727   0.344    0.731
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>> Correlation of Fixed Effects:
>>             (Intr) I(b-1) trtmnS versnB
>> I(abruf-1)  -0.235
>> trtmntStmlt -0.254  0.482
>> versionB    -0.189 -0.029  0.037
>> I(-1):trtmS  0.164 -0.681 -0.689  0.030
>>
>>
>>
>> abruf has values c(1,2,3) so by -1 it starts at a more meaningful point.
>>
>> My question is: is the intercept the ratio of success/no success on abruf
>> 0, treatment control and version A? If so why is it statistically speaking
>> 1 on the log scale? The number of successes increases from abruf 1 to 3 (as
>> seen by the estimate of the model and plots).
>>
>> It's the first time I'm dealing with such complex models. Thank you for
>> your patience and time.
>>
>> Best
>> J Santiago
>>
>>
>>
>> Quoting Thierry Onkelinx <thierry.onkelinx at inbo.be>:
>>
>> It looks like you participants performed a known number of trials which
>>> resulted in either success or failure. The binomial distribution models
>>> exactly that. The model fit would be the probability of success.
>>>
>>> Once you have the relevant distribution, you can set the relevant
>>> covariates. Which and in which form (linear, polynomial, factor) depends
>>> on
>>> the hypotheses which are relevant for your experiment.
>>>
>>> Best regards,
>>>
>>> ir. Thierry Onkelinx
>>> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
>>> Forest
>>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
>>> Kliniekstraat 25
>>> 1070 Anderlecht
>>> Belgium
>>>
>>> To call in the statistician after the experiment is done may be no more
>>> than asking him to perform a post-mortem examination: he may be able to
>>> say
>>> what the experiment died of. ~ Sir Ronald Aylmer Fisher
>>> The plural of anecdote is not data. ~ Roger Brinner
>>> The combination of some data and an aching desire for an answer does not
>>> ensure that a reasonable answer can be extracted from a given body of
>>> data.
>>> ~ John Tukey
>>>
>>> 2017-01-23 10:01 GMT+01:00 João C P Santiago <
>>> joao.santiago at uni-tuebingen.de
>>>
>>>> :
>>>>
>>>
>>> Thank you! Could you be a bit more specific as to why? I will most likely
>>>> encounter similar data in the future and I want to know how to think
>>>> about
>>>> it.
>>>>
>>>> Fitting the model with abruf as a factor resulted in a better fit, but
>>>> that answers a different question right? Namely how different is the
>>>> intercept at a timepoint in comparison with the main level (abruf 0 in my
>>>> code)?
>>>>
>>>> Best
>>>>
>>>>
>>>> Quoting Thierry Onkelinx <thierry.onkelinx at inbo.be>:
>>>>
>>>> Dear João,
>>>>
>>>>>
>>>>> A binomial distribution seems more relevant to me.
>>>>>
>>>>> glmer(cbind(correctPair, incorrectPair) ~ I((abruf - 1)^2) * treatment +
>>>>> (1|subjectNumber), data=data, family = binomial)
>>>>>
>>>>> Best regards,
>>>>>
>>>>> ir. Thierry Onkelinx
>>>>> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
>>>>> and
>>>>> Forest
>>>>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
>>>>> Kliniekstraat 25
>>>>> 1070 Anderlecht
>>>>> Belgium
>>>>>
>>>>> To call in the statistician after the experiment is done may be no more
>>>>> than asking him to perform a post-mortem examination: he may be able to
>>>>> say
>>>>> what the experiment died of. ~ Sir Ronald Aylmer Fisher
>>>>> The plural of anecdote is not data. ~ Roger Brinner
>>>>> The combination of some data and an aching desire for an answer does not
>>>>> ensure that a reasonable answer can be extracted from a given body of
>>>>> data.
>>>>> ~ John Tukey
>>>>>
>>>>> 2017-01-23 8:46 GMT+01:00 João C P Santiago <
>>>>> joao.santiago at uni-tuebingen.de>
>>>>> :
>>>>>
>>>>> Hi,
>>>>>
>>>>>>
>>>>>> In my experiment 20 participants did a word-pairs learning task in two
>>>>>> conditions (repeated measures):
>>>>>> 40 pairs of nouns are presented on a monitor, each for 4s and with an
>>>>>> interval of 1s. The words of each pair were moderately semantically
>>>>>> related
>>>>>> (e.g., brain, consciousness and solution, problem). Two different word
>>>>>> lists were used for the subject’s two experimental conditions, with the
>>>>>> order of word lists balanced across subjects and conditions. The
>>>>>> subject
>>>>>> had unlimited time to recall the appropriate response word, and did
>>>>>> three
>>>>>> trials in succession for each list:
>>>>>>
>>>>>> Condition 1, List A > T1, T2, T3
>>>>>> Condition 2, List B > T1, T2, T3
>>>>>>
>>>>>> No feedback was given as to whether the remembered word was correct or
>>>>>> not.
>>>>>>
>>>>>> I've seen some people go at this with anova, others subtract the total
>>>>>> number of correct pairs in one condition from the other per subject and
>>>>>> run
>>>>>> a t-test. Since this is count data, a generalized linear model should
>>>>>> be
>>>>>> more appropriate, right?
>>>>>>
>>>>>> head(data)
>>>>>>   subjectNumber expDay      bmi treatment tones       hour abruf
>>>>>> correctPair incorrectPair
>>>>>>           <dbl>  <chr>    <dbl>    <fctr> <dbl>     <time> <dbl>
>>>>>>  <dbl>         <dbl>
>>>>>> 1             1     N2 22.53086   Control     0 27900 secs     1
>>>>>> 26            14
>>>>>> 2             1     N2 22.53086   Control     0 27900 secs     2
>>>>>> 40             0
>>>>>> 3             1     N2 22.53086   Control     0 27900 secs     3
>>>>>> 40             0
>>>>>> 4             2     N1 22.53086   Control     0 27900 secs     1
>>>>>> 22            18
>>>>>> 5             2     N1 22.53086   Control     0 27900 secs     2
>>>>>> 33             7
>>>>>> 6             2     N1 22.53086   Control     0 27900 secs     3
>>>>>> 36             4
>>>>>>
>>>>>>
>>>>>>
>>>>>> I fitted a model with glmer.nb(correctPair ~ I((abruf - 1)^2) *
>>>>>> treatment
>>>>>> + (1|subjectNumber), data=data). The residuals don't look so good to me
>>>>>> http://imgur.com/a/AJXGq and the model is fitting values above 40,
>>>>>> which
>>>>>> will never happen in real life (not sure if this is important).
>>>>>>
>>>>>> I'm interested in knowing if there is any difference between conditions
>>>>>> (are the values at timepoint (abruf) 1 different? do people remember
>>>>>> less
>>>>>> in one one condition than in the other (different number of pairs at
>>>>>> timepoint 3?)
>>>>>>
>>>>>>
>>>>>> If the direction I'm taking is completely wrong please let me know.
>>>>>>
>>>>>> Best,
>>>>>> Santiago
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> João C. P. Santiago
>>>>>> Institute for Medical Psychology & Behavioral Neurobiology
>>>>>> Center of Integrative Neuroscience
>>>>>> University of Tuebingen
>>>>>> Otfried-Mueller-Str. 25
>>>>>> 72076 Tuebingen, Germany
>>>>>>
>>>>>> Phone: +49 7071 29 88981
>>>>>> Fax: +49 7071 29 25016
>>>>>>
>>>>>> _______________________________________________
>>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> João C. P. Santiago
>>>> Institute for Medical Psychology & Behavioral Neurobiology
>>>> Center of Integrative Neuroscience
>>>> University of Tuebingen
>>>> Otfried-Mueller-Str. 25
>>>> 72076 Tuebingen, Germany
>>>>
>>>> Phone: +49 7071 29 88981
>>>> Fax: +49 7071 29 25016
>>>>
>>>>
>>>>
>>
>>
>> --
>> João C. P. Santiago
>> Institute for Medical Psychology & Behavioral Neurobiology
>> Center of Integrative Neuroscience
>> University of Tuebingen
>> Otfried-Mueller-Str. 25
>> 72076 Tuebingen, Germany
>>
>> Phone: +49 7071 29 88981
>> Fax: +49 7071 29 25016
>>
>>

-- 
João C. P. Santiago
Institute for Medical Psychology & Behavioral Neurobiology
Center of Integrative Neuroscience
University of Tuebingen
Otfried-Mueller-Str. 25
72076 Tuebingen, Germany

Phone: +49 7071 29 88981
Fax: +49 7071 29 25016