[R-sig-ME] Random Slope for Dummy Variables
Douglas Bates
bates at stat.wisc.edu
Thu Apr 7 22:31:18 CEST 2011
On Thu, Apr 7, 2011 at 3:18 PM, Junyan Luo <jzl106 at gmail.com> wrote:
> Dear All,
> I have recently run into a puzzle with a problem which needs to
> include random slopes for some dummy variables. I have searched the
> archive list and although I found several threads related to random
> slope dummies, they did not solve my problems. To make the question
> simple, consider the following scenario:
> Suppose we want to study the effects on students' Math performance
> (DV) from both individual and school-level covariates. One of the
> individual covariates is ethnicity, and to keep it simple, let's say a
> dummy variable where 1 represents Asian and 0 other wise. Now the
> simple random slope model can be right as:
> 1) Math ~ IV1 + IV2 +...+ Asian + (Asian | School)
> where IVs are other covariates. The random slope is included because
> we want to model the interaction between ethnicity (or "being Asian"
> in this simplified case) and schools. By now everything seems to be
> fine, until I discovered that the distribution of ethnic groups among
> different schools was highly even. In other words, there are a lot
> schools with very few Asian students, and for individuals from those
> schools, the value on the dummy is ALWAYS 0. On the other hand, there
> are a few schools where almost all students are Asians, and the dummy
> becomes always 1. So my questions are:
> A. First, is model 1) still valid in this case? Technically there's no
> problem in running it in R, but is it violating any statistical rules?
> B. If model 1) is OK, what the random slope of Asian means? A random
> slope suggests that there will be different slope value for different
> schools, but what does this mean for schools with no Asian students,
> since the dummy term will always be 0?
> C. If model 1) is NOT OK, what are the alternative solutions?
> Thank you very much!
You are making your life more complicated by defining the dummy
variables. In R you can use the natural formulation, which is a
factor variable, say "Race", which, if you follow the U.S. Census
categories has 5 levels of "White", "Black", "Hispanic", "Asian",
"American Indian". The random effects term (Race|School), or perhaps
more conveniently (0+Race|School) (the difference is in the
parameterization) provides a vector-valued random effect for each
school and estimates a 5 by 5 variance-covariance matrix. An
alternative, which may be easier to estimate is two simple, scalar,
random effects of the form
(1|School) + (1|Race:School)
More information about the R-sig-mixed-models
mailing list