[R] Syntax for a three-level logistic model
Ben Bolker
bbolker at gmail.com
Fri Aug 26 14:36:19 CEST 2011
Diaz-Escamilla, Rafael E <rdiaz <at> saclink.csus.edu> writes:
>
This question is really best for the r-sig-mixed-models at r-project.org
mailing list: please direct any follow-ups there.
> Data Collected My data consist of three levels: level 1 is four
> setting for each student (setting nested within student), and each
> student is registered in one of 14 universities (students nested
> within university). More detailed:
> A. 2,479 students who have a dichotomous outcome, engaged in risk
> behavior (yes or no) in EACH of 4 different settings. Then I am
> using these settings as the level-1 units nested within the
> students, and the students as level-2 units. I also have three
> dichotomous covariates for each student: age (less than 21 and 21
> and older), and gender (male and female), and weather the student
> got drunk in each of the four instances (yes and no). However, I am
> considering this last covariate “drunk” as a level 1 covariate since
> the values for each individual student may change in each setting.
> B. The students, however, are also nested within 14 universities
> (Level-3 units)
> Data File
> My data file consists of the following columns: V1 is student ID
> (each student id appears four times to account for the outcome of
> each of the four settings).
> Student ID’s range from 1001 to 3,479 (total of 2,479)
> V2 and V3 contain the age and gender of each student (each of these
> values repeats 4 times, same as the student
> ID)
> V4, V5, and V6 contain the dummy coding of the 4 settings, say, as:
> V1 V2 V3 V4 V5 V6 …
> 1001 0 1 0 0 0
> 1001 0 1 0 0 1
> 1001 0 1 0 1 0
> 1001 0 1 1 0 0
> . . . .
> Etc.
You don't need the dummy codings. It's probably better just
to use a factor with the values of the four settings. R will
do the work of translating the factor (with appropriate contrasts,
treatment contrasts by default -- although with the first level
rather than the last as the baseline unless you use contr.sas().
It will be easier to switch contrasts if you need to and easier
to work with the results.
> V7 contains the values of the covariate “drunk” (whether the student
> was or not drunk during in a setting)
> V8 contains the ID of the university where the student is enrolled.
> Universities are coded with the numbers 11 – 24 (14 universities
> total).
> V9 contains the dichotomous response variable: the outcome in each
> setting for each student.
>
> Model (random intercept only):
> Logit(p|G00) = P0 + P1(V4) + P2(V5) + P3(V6) + P4(V7)
> (Level 1, setting and whether “drunk” in setting)
> P0 = B00 + B01(V2) + B02(V3)
> (Level 2, age and gender of student )
> B00 = G00
> (Level 3, university – random effect)
>
I believe you just need
glmer(riskbehavior ~ age+gender+setting+drunk+(1|student)+(1|university),
family="binomial",data=data)
The nesting of the fixed factors (age, gender, setting) doesn't
have to be done explicitly. If your nesting is implicit (i.e.
all students have unique IDs, rather than being listed 1,...,N_j
within each university) then you don't have to worry about nesting
either. (If your nesting is explicit, i.e. students have non-unique
IDs, then you need (1|university/student) instead.)
glmmPQL is dubious (there are only 4 binomial responses per
"cluster" (student)) although it might be worth a try just
to see. There you need to specify the random effect as something
~1|university/student .
If the simple model works I would have further questions like:
* should there be an age:gender interaction in the model?
* how about variation in (say) 'settings' across universities,
(settings|university) [or (1|settings:university) for a slightly
more constrained model] ?
More information about the R-help
mailing list