[R-sig-ME] [R] lmer syntax, matrix of (grouped) covariates?
Douglas Bates
bates at stat.wisc.edu
Tue Aug 19 14:04:01 CEST 2008
On Tue, Aug 19, 2008 at 1:59 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
>>>>>> "AM" == Aaron Mackey <ajmackey at gmail.com>
>>>>>> on Mon, 18 Aug 2008 15:41:20 -0400 writes:
>
> AM> Dear Dr. Bates, Thanks for the info about the SIG on
> AM> mixed models.
>
> AM> A followup question: if I did not want a generalized
> AM> variance-covariance matrix, and instead wanted to assume
> AM> independence between grouped covariates, then instead of
> AM> (3n-n^2)/2 variance-covariance parameters per grouping
> AM> (of size n), I'd have only n variance parameters, yes?
> AM> Would that be specified (x1|fac) + (x2|fac) instead of
> AM> (x1 + x2 | fac)?
>
> yes.
For lmer I think you would want to write that as (0+x1|fac) + (0+x2|fac) + ...
By default a linear model formula includes the intercept term so that
(x1|fac) is equivalent to (1+x1|fac). If you do not suppress the
implicit intercept you end up with multiple terms including it.
> AM> And either way I formulate it, my syntax question
> AM> remains: is there a way to write the formula that is
> AM> generic for a given covariate matrix of unspecified
> AM> dimension? Said another way, how would you write a lmer
> AM> "wrapper" function that was given a matrix of
> AM> covariates? How would you "unpack" the columns of that
> AM> matrix to generate the formula required for lmer()?
> AM> Would you build up a formula string programmatically,
> AM> and then "eval" it?
>
> yes / "kind of" :
>
> I would build the formula programmatically
> but then use as.formula(.) or possibly in some cases parse().
>
> Package 'sfsmisc' has a utility wrapFormula() that I had need
> for at some time:
>
> ----------------------------------------
>
> install.packages("sfsmisc")
> library(sfsmisc)
>
> myF <- wrapFormula(Fertility ~ . , data = swiss)
> myF
> ## Fertility ~ s(Agriculture) + s(Examination) + s(Education) +
> ## s(Catholic) + s(Infant.Mortality)
>
> ------------------------------------------
>
> and you can look at the short definition of wrapFormula
> to see how I use gsub(.) on character strings to construct the
> new formula.
>
> Martin Maechler, ETH Zurich
>
>
> AM> Thanks again,
>
> AM> -Aaron
>
> AM> On Mon, Aug 18, 2008 at 3:21 PM, Douglas Bates
> AM> <bates at stat.wisc.edu> wrote:
>
> >> On Mon, Aug 18, 2008 at 11:20 AM, Aaron Mackey
> >> <ajmackey at gmail.com> wrote: > I have a fairly large
> >> model:
> >>
> >> >> length(Y) > [1] 3051 >> dim(covariates) > [1] 3051 211
> >>
> >> > All of these 211 covariates need to be nested
> >> hierarchically within a > grouping "class", of which
> >> there are 8. I have an accessory vector, " > cov2class"
> >> that specifies the mapping between covariates and the 8
> >> classes.
> >>
> >> > Now, I understand I can break all this information up
> >> into individual > vectors (cov1, cov2, ..., cov211,
> >> class1, class2, ..., class8), and do > something like
> >> this:
> >>
> >> > model <- lmer(Y ~ 1 + cov1 + cov2 + ... + cov211 + >
> >> (cov1 + cov2 + ... | class1) + > (...) + > (... + cov210
> >> + cov211 | class8)
> >>
> >> > But I'd like keep things syntactically simpler, and use
> >> the covariates > and cov2class > variables directly. I
> >> haven't been able to find the right syntactic sugar > to
> >> get this done.
> >>
> >> Before considering the syntax you should consider the
> >> complexity of the model. Even with 3000+ observations
> >> estimating fixed effects for over 200 covariates is a
> >> risky business. To follow that up by incorporating
> >> dozens of covariates in a vector-valued random effects
> >> term with a general variance covariance matrix is not
> >> likely to be productive. When estimating variances and
> >> covariances of random effects the complexity of the
> >> estimation problem increases according to the square of
> >> the dimension of each random-effects vector. For a term
> >> of the form (1|fac) there is one variance-covariance
> >> parameter to estimate. For a term of the form (1+x|fac)
> >> there are 3 such parameters to estimate, (1+x1+x2|fac)
> >> requires 6, etc. Assuming that your 211 covariates are
> >> more-or-less equally distributed among the 8 classes you
> >> would have about 27 per class which means that you have
> >> 378 variance-covariance parameters to estimate for each
> >> of your 8 classes. Even with 3000+ observations it would
> >> be optimistic to expect to estimate that many
> >> variance-covariance parameters.
> >>
> >> You may want to move your question to the
> >> R-SIG-Mixed-Models mailing list and follow up on the
> >> model specification there.
> >>
>
> [[alternative HTML version deleted]]
>
> AM> _______________________________________________
> AM> R-sig-mixed-models at r-project.org mailing list
> AM> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
More information about the R-sig-mixed-models
mailing list