[R-sig-ME] [R] lmer syntax, matrix of (grouped) covariates?

Tue Aug 19 14:04:01 CEST 2008

On Tue, Aug 19, 2008 at 1:59 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
>>>>>> "AM" == Aaron Mackey <ajmackey at gmail.com>
>>>>>>     on Mon, 18 Aug 2008 15:41:20 -0400 writes:
>
>    AM> Dear Dr. Bates, Thanks for the info about the SIG on
>    AM> mixed models.
>
>    AM> A followup question: if I did not want a generalized
>    AM> variance-covariance matrix, and instead wanted to assume
>    AM> independence between grouped covariates, then instead of
>    AM> (3n-n^2)/2 variance-covariance parameters per grouping
>    AM> (of size n), I'd have only n variance parameters, yes?
>    AM> Would that be specified (x1|fac) + (x2|fac) instead of
>    AM> (x1 + x2 | fac)?
>
> yes.

For lmer I think you would want to write that as (0+x1|fac) + (0+x2|fac) + ...

By default a linear model formula includes the intercept term so that
(x1|fac) is equivalent to (1+x1|fac).  If you do not suppress the
implicit intercept you end up with multiple terms including it.

>    AM> And either way I formulate it, my syntax question
>    AM> remains: is there a way to write the formula that is
>    AM> generic for a given covariate matrix of unspecified
>    AM> dimension?  Said another way, how would you write a lmer
>    AM> "wrapper" function that was given a matrix of
>    AM> covariates?  How would you "unpack" the columns of that
>    AM> matrix to generate the formula required for lmer()?
>    AM> Would you build up a formula string programmatically,
>    AM> and then "eval" it?
>
> yes / "kind of" :
>
> I would build the formula programmatically
> but then use  as.formula(.)  or possibly in some cases  parse().
>
> Package 'sfsmisc' has a utility  wrapFormula() that I had need
> for at some time:
>
> ----------------------------------------
>
> install.packages("sfsmisc")
> library(sfsmisc)
>
> myF <- wrapFormula(Fertility ~ . , data = swiss)
> myF
> ## Fertility ~ s(Agriculture) + s(Examination) + s(Education) +
> ##             s(Catholic) + s(Infant.Mortality)
>
> ------------------------------------------
>
> and you can look at the short definition of  wrapFormula
> to see how I use gsub(.) on character strings to construct the
> new formula.
>
> Martin Maechler, ETH Zurich
>
>
>    AM> Thanks again,
>
>    AM> -Aaron
>
>    AM> On Mon, Aug 18, 2008 at 3:21 PM, Douglas Bates
>    AM> <bates at stat.wisc.edu> wrote:
>
>    >> On Mon, Aug 18, 2008 at 11:20 AM, Aaron Mackey
>    >> <ajmackey at gmail.com> wrote: > I have a fairly large
>    >> model:
>    >>
>    >> >> length(Y) > [1] 3051 >> dim(covariates) > [1] 3051 211
>    >>
>    >> > All of these 211 covariates need to be nested
>    >> hierarchically within a > grouping "class", of which
>    >> there are 8.  I have an accessory vector, " > cov2class"
>    >> that specifies the mapping between covariates and the 8
>    >> classes.
>    >>
>    >> > Now, I understand I can break all this information up
>    >> into individual > vectors (cov1, cov2, ..., cov211,
>    >> class1, class2, ..., class8), and do > something like
>    >> this:
>    >>
>    >> > model <- lmer(Y ~ 1 + cov1 + cov2 + ... + cov211 + >
>    >> (cov1 + cov2 + ... | class1) + > (...) + > (... + cov210
>    >> + cov211 | class8)
>    >>
>    >> > But I'd like keep things syntactically simpler, and use
>    >> the covariates > and cov2class > variables directly.  I
>    >> haven't been able to find the right syntactic sugar > to
>    >> get this done.
>    >>
>    >> Before considering the syntax you should consider the
>    >> complexity of the model.  Even with 3000+ observations
>    >> estimating fixed effects for over 200 covariates is a
>    >> risky business.  To follow that up by incorporating
>    >> dozens of covariates in a vector-valued random effects
>    >> term with a general variance covariance matrix is not
>    >> likely to be productive.  When estimating variances and
>    >> covariances of random effects the complexity of the
>    >> estimation problem increases according to the square of
>    >> the dimension of each random-effects vector.  For a term
>    >> of the form (1|fac) there is one variance-covariance
>    >> parameter to estimate.  For a term of the form (1+x|fac)
>    >> there are 3 such parameters to estimate, (1+x1+x2|fac)
>    >> requires 6, etc.  Assuming that your 211 covariates are
>    >> more-or-less equally distributed among the 8 classes you
>    >> would have about 27 per class which means that you have
>    >> 378 variance-covariance parameters to estimate for each
>    >> of your 8 classes.  Even with 3000+ observations it would
>    >> be optimistic to expect to estimate that many
>    >> variance-covariance parameters.
>    >>
>    >> You may want to move your question to the
>    >> R-SIG-Mixed-Models mailing list and follow up on the
>    >> model specification there.
>    >>
>
>        [[alternative HTML version deleted]]
>
>    AM> _______________________________________________
>    AM> R-sig-mixed-models at r-project.org mailing list
>    AM> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>