[R-sig-ME] Bernoulli glmm question.

Thu Mar 13 14:16:36 CET 2014

  Hi,

for the case given by Rolf Turner, I would add word as a random factor. 
"word" appears as a grouping factor for the observation  "type"

  I am " more nervous ( :)  )" about  the relevance of  more complex 
random effect designs,
accounting for correlations between type and student , and/or  type and 
(word and student)
(To tell the truth, I have similar  questions about  a model I'm 
currently  working on),  as:

fit2<- glmer(y ~ sex + type + (type | student) +(1|word), family = binomial, data = X)
where there are 7  (possibly correlated), random effects for each student

or
fit3<- glmer(y ~ sex + type + (type | student) +(type|word), family = binomial, data = X)
where there are 7  (possibly correlated), random effects for each student,
and 7  (possibly correlated), random effects for for each word

D. Bates suggested , at least as a starting point:

fit3a<- glmer(y ~ sex + type + (1|student ) +(1+type:student) +(1|word)+ (1+ type:word), family = binomial, data = X)

which is a simpler model to estimate (without any correlations), still accounting for the interactions between type and (student, word)

Which would be the "right " model ?
Is the data-driven (forward) selection of random terms still accepted  (or not, according to D. Barr ) ?
Recent discussions on this list showed at least some controverse about it.

I would be glad to be "corrected" , enlighted or advised on these points.
Thanks, in advanced, for your attention

Best regards

Robert Espesser
CNRS UMR  7309 - Université Aix-Marseille
5 Avenue Pasteur
13100 AIX-EN-PROVENCE

Tel: +33 (0)413 55 36 26

Le 13/03/2014 08:40, Tibor Kiss a écrit :
> Hi,
>
> I cannot say much about your question concerning the variance, but I would probably include "word" as a random factor as well. It's not easy to understand from your email, but I assume that each word is 'cut' into phonemes, so that your 10314 observations are actually 10314 slices of the 50 words fed to the 54 students. So "y" will be 0 or 1 for a phoneme. It might be the case that the whole word influences the pronunciation, and the words have been chosen at random, I assume, so I would include them as a random factor.
>
> Also, I would use glmer directly, but that might be cosmetics.
>
> With kind regards
>
> Tibor
>
> ----------------------------------------------
> Prof. Dr. Tibor Kiss
> Sprachwissenschaftliches Institut
> Ruhr-Universität Bochum
> www.linguistics.rub.de/~kiss
>
>
> Am 13.03.2014 um 02:55 schrieb Rolf Turner<r.turner at auckland.ac.nz>:
>
>> I am trying to help a graduate student in linguistics analyse her data.  Very much a case of the blind leading the blind, but I gotta try!
>>
>> Summary of the structure of the data:
>>
>> A number of (Mandarin speaking) students are assessed on their pronunciation of a suite of "test items" --- English language words.
>> (E.g. umbrella, helicopter, knife.)  They are assessed phoneme by phoneme in each word.  The response, at least in the context of my question, is whether they got the pronunciation right (y = 1) or wrong (y = 0).
>>
>> The phonemes are classified into 7 types:
>>
>> * Initial consonant
>> * Medial consonant
>> * Final consonant
>> * Initial consonant cluster
>> * Medial consonant cluster
>> * Final consonant cluster
>> * vowel
>>
>> The students are classified by sex ("gender" to those wimps who are too embarrassed to say the word "sex").
>>
>> I thought to fit a Bernoulli model with "type" (of phoneme) and sex (of the student) as predictors, with "student" being a random effect.
>>
>> The syntax that I tried was:
>>
>> fit<- lmer(y ~ sex + type + (1 | student), family = binomial, data = X)
>>
>> where "X" is a data frame containing the relevant variables.
>>
>> Main effects only, no interactions, so as to keep things simple --- at least initially.
>>
>> First impressions from the fit:  Girls do significantly better than boys, and vowels are significantly easier than final consonant clusters (which form the baseline) and initial and medial clusters are significantly harder for the kids to pronounce than are the final clusters.  Single consonants (initial, medial, and final) do not differ significantly from the baseline in their difficulty level.
>>
>> The bit about vowels being easier conforms to the graduate student's expectations and is kind of obvious from a rough inspection of the data.
>>
>> There are 50 "test items" (words).  In the data set that I am initially looking at there are 54 students.  There are a total of 10314 observations.
>>
>> (I am just looking at the oldest group of students to start with.  There are 6 other groups and eventually I will put all 7 groups together and investigate an age (or "level") effect as well.)
>>
>> Would anyone be kind enough to comment on my efforts so far? Please try not to be too rude! :-)  Am I on the right track? Am I overlooking any glaring traps for young players? Have I got the syntax of my call to lmer() correct?
>>
>> One thing that I am nervous about:
>>
>> If I fit the "trivial model"
>>
>> fit0<- lmer(y ~ 1 + (1 | student), family = binomial, data = X)
>>
>> the resulting coefficients are just the estimates (BLUPs?) of the "random intercepts, is it not so?  If I calculate the variance of these coefficients:
>>
>> 	var(coef(fit0)$student[,1])
>>
>> I get 0.0226.  I thought that this value would be "pretty similar to" (though not exactly the same as) the estimated random effect variance. But the latter is 0.0502 --- which seems to me to be quite different.
>>
>> A 95% confidence interval for sigma^2 on the basis of my "var(coef ...)" calculation, assuming that (n-1)*s^2/sigma^2 ~ chi-squared_{n-1},
>> is [0.0160, 0.0345] (to 4 decimal places) so the estimated random effect variance from fit0 is "significantly different" from my naive estimate.
>>
>> My thinking must be out to lunch here.  Can someone put me back on the rails.  (My humblest apologies for the mixed metaphors. :-) )
>>
>> Thanks for any words of wisdom.
>>
>> cheers,
>>
>> Rolf Turner
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models