[R-sig-ME] Random effects with custom design matrices

Fri Feb 18 21:13:16 CET 2011

On Fri, Feb 18, 2011 at 10:15 AM, Arnaud Le Rouzic
<Arnaud.Le-Rouzic at legs.cnrs-gif.fr> wrote:
> Hi the list,

> Sorry if this is a FAQ, I could not find anything corresponding in the
> archive of the list.

> Along with some colleagues, we were wondering if there was a simple way to
> customize the design matrix of random effects in lmer. The particular
> context was that of a "diallel" experiment in quantitative genetics, which
> consists in crossing N populations each other to measure the hybrid
> phenotype. The model is

> y_ijk =  mu + g_i + g_j + d_ij + e_ijk

> and the "funny" thing is that g_i and g_j are 1/2 the effect of the two
> parental populations, i.e. two occurrences of the same random effect. d_ij
> is the interaction effect, and it is non-0 only if i and j are different.

> It is quite straightforward to write the design matrices. Let's consider a
> simple data set, like:
>
>  father mother        phen
> 1      1      1  1.56
> 2      1      1 -0.09
> 3      1      2  2.18
> 4      1      2  0.73
> 5      1      3  0.87
> 6      1      3 -0.06
>
> If the model is written Y = X1 G + X2 D, matrix X1 is
>
> 1    1    0    0
> 2    1    0    0
> 3    0.5    0.5    0
> 4    0.5    0.5    0
> 5    0.5    0    0.5
> 6    0.5    0    0.5
> etc.,
>
> and matrix X2 is
>
> 1    0    0    0
> 2    0    0    0
> 3    1    0    0
> 4    1    0    0
> 5    0    1    0
> 6    0    1    0
> etc
>
> However, lmer crashes when providing the matrices as such:
> lmer(phen~1+(1|X1) + (1|X2), data=dd)

A random-effects term in lmer is assumed to be of the form (1|F) where
F is a factor or, more generally, (form|F) where form is a linear
model formula and F is a factor.

When you say "lmer crashes" do you mean that it produces an error
message and stops or do you mean that it causes R to crash.  It should
do the former.  It is a mistake if it does the latter.

We intend to allow lmer to be able to use more flexible model matrices
for the random effects although, at present, that requires a certain
amount of tweaking on the part of the user and is best accomplished in
the lme4a package.  To be able to write down the model, however, it
would be necessary to specify the unconditional variance-covariance
matrix of the random effects - your G and D - and incorporate that in
the form of the model.

> There seems to be a workaround in both cases, for X1 each line can be
> duplicated in the data and the parent coded as a regular factor, and for X2
> the 0 lines can be removed both from the data vector and from the design
> matrix, so that, again, it can be coded as a vector of factors. However, I
> cannot find a way to fit both effects at the same time, which is quite
> embarrassing. An alternative would be to fit an animal model with MCMCglmm,
> but it looks like killing a fly with a bazooka (and on the top of it, I
> can't find any automatic way to include dominance in the animal model).
>
> I am quite sure I am missing something, but I don't know if this is a pure R
> issue or if there is something in the underlying statistics that makes it
> impossible to handle "non standard" design matrices.

It's not a theoretical problem but rather the practical problem of how
to get from a model specification in some form to the model matrices
and methods of defining the variance-covariance matrix of the random
effects.  Manipulating the formulas to get that is not
straightforward.  The actual optimization once you have the numeric
structures set up is comparatively simple but even that requires a lot
of attention to detail.