The issue is that I have some linear mixed models that I cannot specify
using lme4 model formula.
I have made some extensions of the model formula such that I can specify
the models,
but would like some feedback on that.
My motivation is that I am constructing a primitive R-interface for some
other software for mixed models, where
typical applications are from animal breeding. The two examples below
cannot be specified using lme4 model formulas.
## Example 1 :
## Direct and Maternal genetic effects.
Background is that genetic effect on a trait (for example growth) could
be both the animals
own genetics and the genetics of the mother.
Let i be the animal number. For the animals with observation a model is
Y_i = \mu + X^D_{i} + X^M_{mother(i)}
Note that all animals have a direct genetic effect X^D_i
(including animals without observations),
and all animals have a maternal genetic effect X^M_i (including animals
with observations but without offspring,
and including males !).
There are two levels of correlation. First the animals are related, and
information about this is provided in a pedigree.
This is handled using a "corlist" argument, and is NOT the issue here.
The second levels of correlation is that (X^D_i,X^M_i) has a general 2*2
variance matrix.
The correlation of (X^D_i,X^M_i) is relevant to know when breeding.
Since selecting animal with a high direct effect
would imply a selection for animals with high/low (depending on the sign
of correlation) maternal effects,
and the other way around. Imagine linked genes where some influence X^D
and others influence X^M.
## Example 2 :
## QTL.
Assuming no dominance effects but only additive effect of the two
haplotypes (centered around a given loci) we have
Y_i = \mu + X_{hap1(i)} + X_{hap2(i)}
where hap1(i) is the maternally inherited haplotype, hap2(i) is the
paternally inherited haplotype,
and X_h is the effect of haplotype h.
The X_h are correlated due to haplotypes being more or less identical by
descent,
and this is handled using a "corlist" argument,
but that correlation is NOT the issue here.
##
My extensions of the formula language are as follows.
For Example 2 I have used the formula " Y ~ (1|hap1+hap2) " whereas for
Example 1 I have in mind to
use " Y ~ (c(1,1) | D + M) ".
Some disadvantages I can see are :
1) allowing the specification of effects (1|A+B) with the meaning as
above,
then it would be natural to have (1|A/B) be the same as (1|A + A:B) .
However that gives a conflict with lme4 where (1|A/B) is a shorthand for
(1|A) + (1|A:B) .
My solution has been not to allow term like (1|A/B).
2) the (1|A+B) and (c(1,1) | A + B) notation is not entirely general.
Considering three terms, then effects
(1|A+B+C) and (c(1,1,1) | A+B+C) can be defined, but a model like
Y_i = \mu + X^D_{hap1(i)} + X^D_{hap2(i) + X^M_{hap1(mother(i))} +
X^M_{hap2(mother(i))}
with (X^_h,X^M_h) cannot be described. Not a practical issue so far,
only an issue of completeness.
Comments and suggestions are very welcome.
Yours
Ole Christensen
[[alternative HTML version deleted]]