[R-sig-ME] Bernoulli glmm question.

Tibor Kiss tibor at linguistics.rub.de
Thu Mar 13 08:40:07 CET 2014


Hi,

I cannot say much about your question concerning the variance, but I would probably include "word" as a random factor as well. It's not easy to understand from your email, but I assume that each word is 'cut' into phonemes, so that your 10314 observations are actually 10314 slices of the 50 words fed to the 54 students. So "y" will be 0 or 1 for a phoneme. It might be the case that the whole word influences the pronunciation, and the words have been chosen at random, I assume, so I would include them as a random factor. 

Also, I would use glmer directly, but that might be cosmetics.

With kind regards

Tibor

----------------------------------------------
Prof. Dr. Tibor Kiss
Sprachwissenschaftliches Institut
Ruhr-Universität Bochum
www.linguistics.rub.de/~kiss


Am 13.03.2014 um 02:55 schrieb Rolf Turner <r.turner at auckland.ac.nz>:

> 
> I am trying to help a graduate student in linguistics analyse her data.  Very much a case of the blind leading the blind, but I gotta try!
> 
> Summary of the structure of the data:
> 
> A number of (Mandarin speaking) students are assessed on their pronunciation of a suite of "test items" --- English language words.
> (E.g. umbrella, helicopter, knife.)  They are assessed phoneme by phoneme in each word.  The response, at least in the context of my question, is whether they got the pronunciation right (y = 1) or wrong (y = 0).
> 
> The phonemes are classified into 7 types:
> 
> * Initial consonant
> * Medial consonant
> * Final consonant
> * Initial consonant cluster
> * Medial consonant cluster
> * Final consonant cluster
> * vowel
> 
> The students are classified by sex ("gender" to those wimps who are too embarrassed to say the word "sex").
> 
> I thought to fit a Bernoulli model with "type" (of phoneme) and sex (of the student) as predictors, with "student" being a random effect.
> 
> The syntax that I tried was:
> 
> fit <- lmer(y ~ sex + type + (1 | student), family = binomial, data = X)
> 
> where "X" is a data frame containing the relevant variables.
> 
> Main effects only, no interactions, so as to keep things simple --- at least initially.
> 
> First impressions from the fit:  Girls do significantly better than boys, and vowels are significantly easier than final consonant clusters (which form the baseline) and initial and medial clusters are significantly harder for the kids to pronounce than are the final clusters.  Single consonants (initial, medial, and final) do not differ significantly from the baseline in their difficulty level.
> 
> The bit about vowels being easier conforms to the graduate student's expectations and is kind of obvious from a rough inspection of the data.
> 
> There are 50 "test items" (words).  In the data set that I am initially looking at there are 54 students.  There are a total of 10314 observations.
> 
> (I am just looking at the oldest group of students to start with.  There are 6 other groups and eventually I will put all 7 groups together and investigate an age (or "level") effect as well.)
> 
> Would anyone be kind enough to comment on my efforts so far? Please try not to be too rude! :-)  Am I on the right track? Am I overlooking any glaring traps for young players? Have I got the syntax of my call to lmer() correct?
> 
> One thing that I am nervous about:
> 
> If I fit the "trivial model"
> 
> fit0 <- lmer(y ~ 1 + (1 | student), family = binomial, data = X)
> 
> the resulting coefficients are just the estimates (BLUPs?) of the "random intercepts, is it not so?  If I calculate the variance of these coefficients:
> 
> 	var(coef(fit0)$student[,1])
> 
> I get 0.0226.  I thought that this value would be "pretty similar to" (though not exactly the same as) the estimated random effect variance. But the latter is 0.0502 --- which seems to me to be quite different.
> 
> A 95% confidence interval for sigma^2 on the basis of my "var(coef ...)" calculation, assuming that (n-1)*s^2/sigma^2 ~ chi-squared_{n-1},
> is [0.0160, 0.0345] (to 4 decimal places) so the estimated random effect variance from fit0 is "significantly different" from my naive estimate.
> 
> My thinking must be out to lunch here.  Can someone put me back on the rails.  (My humblest apologies for the mixed metaphors. :-) )
> 
> Thanks for any words of wisdom.
> 
> cheers,
> 
> Rolf Turner
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models



More information about the R-sig-mixed-models mailing list