[R] Can lmer() fit a multilevel model embedded in a regression?

Andrew Gelman gelman at stat.columbia.edu
Mon May 22 17:12:34 CEST 2006


Harold,

I think we use slightly different notation (I like to use variance 
parameters rather than covariance matrices).  Let me try to write it in 
model form:

Data points y_i, i=1,...,800

800 x 84 matrix of predictors, X:  for columns j=1,...,82, X_{i,j} is 
the amount of food j consumed by person i.  X_{i,83} is an indicator (1 
if male, 0 if female), and X_{i,84} is the age of person i.

Data-level model:  Pr (y_i=1) = inverse.logit (X_i*beta), for 
i=1,...,800, with independent outcomes.

beta is a (column) vector of length 84.

Group-level model:  for j=1,...,82:  beta_j ~ Normal (gamma_0 + gamma_1 
* u_j, sigma^2_{beta}).

u is a vector of length 82, where u_j = folate concentration in food j

gamma_0 and gamma_1 are scalar coefficients (for the group-level model), 
and sigma_{beta} is the sd of the group-level errors.

It would be hopeless to estimate all the betas using maximum 
likelihood:  that's 800 data points and 84 predictors, the results will 
just be too noisy.  But it should be ok using the 2-level model above.  
The question is:  can I fit in lmer()?

Thanks again.
Andrew

Doran, Harold wrote:

> So, in the hierarchical notation, does the model look like this (for 
> the linear predictor):
>
> DV = constant + food_1(B_1) + food_2(B_2) + ... + food_82(B_82) + 
> sex(B_83) + age(B_84)
> food_1 = gamma_00 + gamma_01(folic) + r_01
> food_2 = gamma_10 + gamma_11(folic) + r_02
> ...
> food_82 = gamma_20 + gamma_21(folic) + r_82
>
> where r_qq ~ N(0, Psi) and Psi is an 82-dimensional covariance matrix.
>
> I usually need to see this in model form as it helps me translate this 
> into lmer syntax if it can be estimated. From what I see, this would 
> be estimating 82(82+1)/2 = 3403 parameters in the covariance matrix.
>
> What I'm stuck on is below you say it would be hopeless to estimate 
> the 82 predictors using ML. But, if I understand the model correctly, 
> the multilevel regression still resolves the predictors (fixed 
> effects) using ML once estimates of the variances are obtained. So, I 
> feel I might still be missing something.
>
>
>
> -----Original Message-----
> From:   Andrew Gelman [mailto:gelman at stat.columbia.edu]
> Sent:   Sun 5/21/2006 7:35 PM
> To:     Doran, Harold
> Cc:     r-help at stat.math.ethz.ch; reg26 at columbia.edu
> Subject:        Re: [R] Can lmer() fit a multilevel model embedded in 
> a regression?
>
> Harold,
>
> I get confused by the terms "fixed" and "random".  Our first-level model
> (in the simplified version we're discussing here) has 800 data points
> (the persons in the study) and 84 predictors:  sex, age, and 82
> coefficients for foods.  The second-level model has 82 data points (the
> foods) and two predictors:  a constant term and folic acid concentration.
>
> It would be hopeless to estimate the 82 food coefficients via maximum
> likelihood, so the idea is to do a multilevel model, with a regression
> of these coefficients on the constant term and folic acid.  The
> group-level model has a residual variance.  If the group-level residual
> variance is 0, it's equivalent to ignoring food, and just using total
> folic acid as an individual predictor.  If the group-level residual
> variance is infinity, it's equivalent to estimating the original
> regression (with 84 predictors) using least squares.
>
> The difficulty is that the foods aren't "groups" in the usual sense,
> since persons are not nested within foods; rather, each person eats many
> foods, and this is reflected in the X matrix.
>
> Andrew
>
> Doran, Harold wrote:
>
> > OK, I'm piecing this together a bit, sorry I'm not familiar with the
> > article you cite. Let me try and fully understand the issue if you
> > don't mind. Are you estimating each of the 82 foods as fixed effects?
> > If so, in the example below this implies 84 total fixed effects (1 for
> > each food type in the X matrix and then sex and age).
> >
> > I'm assuming that food type is nested within one of the 82 folic acid
> > concentrations and then folic acid is treated as a random effect.
> >
> > Is this accurate?
> >
> >
> > -----Original Message-----
> > From:   Andrew Gelman [mailto:gelman at stat.columbia.edu]
> > Sent:   Sun 5/21/2006 9:17 AM
> > To:     Doran, Harold
> > Cc:     r-help at stat.math.ethz.ch; reg26 at columbia.edu
> > Subject:        Re: [R] Can lmer() fit a multilevel model embedded in
> > a regression?
> >
> > Harold,
> >
> > I'm confused now.  Just for concretness, suppose we have 800 people, 82
> > food items, and one predictor ("folic", the folic acid concentration) at
> > the food-item level.  Then DV will be a vector of length 800, foods is
> > an 800 x 82 matrix, sex is a vector of length 800, age is a vector of
> > length 800, and folic is a vector of length 82.  The vector of folic
> > acid concentrations in individual diets is then just foods%*%folic,
> > which I can call folic_indiv.
> >
> > How would I fit the model in lmer(), then?  There's some bit of
> > understading that I'm still missing.
> >
> > Thanks.
> > Andrew
> >
> >
> > Doran, Harold wrote:
> >
> > > Prof Gelman:
> > >
> > > I believe the answer is yes. It sounds as though persons are partially
> > > crossed within food items?
> > >
> > > Assuming a logit link, the syntax might follow along the lines of
> > >
> > > fm1 <- lmer(DV ~ foods + sex + age + (1|food_item), data, family =
> > > binomial(link='logit'), method = "Laplace", control = list(usePQL=
> > > FALSE) )
> > >
> > > Maybe this gets you partly there.
> > >
> > > Harold
> > >
> > >
> > >
> > > -----Original Message-----
> > > From:   r-help-bounces at stat.math.ethz.ch on behalf of Andrew Gelman
> > > Sent:   Sat 5/20/2006 5:49 AM
> > > To:     r-help at stat.math.ethz.ch
> > > Cc:     reg26 at columbia.edu
> > > Subject:        [R] Can lmer() fit a multilevel model embedded in a
> > > regression?
> > >
> > > I would like to fit a hierarchical regression model from Witte et al.
> > > (1994; see reference below).  It's a logistic regression of a health
> > > outcome on quntities of food intake; the linear predictor has the 
> form,
> > > X*beta + W*gamma,
> > > where X is a matrix of consumption of 82 foods (i.e., the rows of X
> > > represent people in the study, the columns represent different foods,
> > > and X_ij is the amount of food j eaten by person i); and W is a matrix
> > > of some other predictors (sex, age, ...).
> > >
> > > The second stage of the model is a regression of X on some food-level
> > > predictors.
> > >
> > > Is it possible to fit this model in (the current version of) lmer()?
> > > The challenge is that the persons are _not_ nested within food 
> items, so
> > > it is not a simple multilevel structure.
> > >
> > > We're planning to write a Gibbs sampler and fit the model 
> directly, but
> > > it would be convenient to be able to flt in lmer() as well to check.
> > >
> > > Andrew
> > >
> > > ---
> > >
> > > Reference:
> > >
> > > Witte, J. S., Greenland, S., Hale, R. W., and Bird, C. L. (1994).
> > > Hierarchical regression analysis applied to a
> > > study of multiple dietary exposures and breast cancer.  
> Epidemiology 5,
> > > 612-621.
> > >
> > > --
> > > Andrew Gelman
> > > Professor, Department of Statistics
> > > Professor, Department of Political Science
> > > gelman at stat.columbia.edu
> > > www.stat.columbia.edu/~gelman
> > >
> > > Statistics department office:
> > >   Social Work Bldg (Amsterdam Ave at 122 St), Room 1016
> > >   212-851-2142
> > > Political Science department office:
> > >   International Affairs Bldg (Amsterdam Ave at 118 St), Room 731
> > >   212-854-7075
> > >
> > > Mailing address:
> > >   1255 Amsterdam Ave, Room 1016
> > >   Columbia University
> > >   New York, NY 10027-5904
> > >   212-851-2142
> > >   (fax) 212-851-2164
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide!
> > > http://www.R-project.org/posting-guide.html
> > >
> > >
> >
> > --
> > Andrew Gelman
> > Professor, Department of Statistics
> > Professor, Department of Political Science
> > gelman at stat.columbia.edu
> > www.stat.columbia.edu/~gelman
> >
> > Statistics department office:
> >   Social Work Bldg (Amsterdam Ave at 122 St), Room 1016
> >   212-851-2142
> > Political Science department office:
> >   International Affairs Bldg (Amsterdam Ave at 118 St), Room 731
> >   212-854-7075
> >
> > Mailing address:
> >   1255 Amsterdam Ave, Room 1016
> >   Columbia University
> >   New York, NY 10027-5904
> >   212-851-2142
> >   (fax) 212-851-2164
> >
> >
> >
>
> --
> Andrew Gelman
> Professor, Department of Statistics
> Professor, Department of Political Science
> gelman at stat.columbia.edu
> www.stat.columbia.edu/~gelman
>
> Statistics department office:
>   Social Work Bldg (Amsterdam Ave at 122 St), Room 1016
>   212-851-2142
> Political Science department office:
>   International Affairs Bldg (Amsterdam Ave at 118 St), Room 731
>   212-854-7075
>
> Mailing address:
>   1255 Amsterdam Ave, Room 1016
>   Columbia University
>   New York, NY 10027-5904
>   212-851-2142
>   (fax) 212-851-2164
>
>
>

-- 
Andrew Gelman
Professor, Department of Statistics
Professor, Department of Political Science
gelman at stat.columbia.edu
www.stat.columbia.edu/~gelman

Statistics department office:
  Social Work Bldg (Amsterdam Ave at 122 St), Room 1016
  212-851-2142
Political Science department office:
  International Affairs Bldg (Amsterdam Ave at 118 St), Room 731
  212-854-7075

Mailing address:
  1255 Amsterdam Ave, Room 1016
  Columbia University
  New York, NY 10027-5904
  212-851-2142
  (fax) 212-851-2164



More information about the R-help mailing list