[R-sig-ME] [R] understanding I() in lmer formula
don-r-help at isis.cs3-inc.com
Thu Jun 15 16:49:50 CEST 2017
( V(A) cov( A, B ) ) ( f g )
Sigma = ( ) = ( )
( cov( A, B ) V(B) ) ( g h )
so the aim of lme4 (or any other software) is to find the << best >>
values for all of these five parameters.
So when I run lmer I should be able to recover these 5 values, right?
How do I do that?
When you say it finds the best values, that means to me that the
objective function (whatever we're trying to minimize) includes
something that depends on those parameters.
What is that function?
I guess you have to pick some particular model in order to show its
objective function, and I suggest the simplest possible model, in
this case something like
reaction ~ 1+days + (1+days|subject)
and similarly the || version, which I gather differs only in that
some g terms are left out.
This is done by trying to
find the values that allow to say << I obtained my data because they
were the more likely to occur >>. This leads to a complex function that
often reduce to least-squares, but not always, and in Gaussian
mixed-effects models are not linear least-squares because of the f, g
and h parameters.
Traditionnally, you fix uA = uB = 0 because should they have other
values, they could not be distinguished from the fixed part of the
model (but you could also say that there is no distinct fixed part and
that you try to find their values, it's the same).
When you use (1|Day), you allow lmer to fit a model with f, g and h
When you use (1||Day), you constrain lmer to fit a model with g = 0
and only f and h can be fitted.
All of this makes sense to me, but in order to really understand what's
going on I want to know the objective function.
Note that with lmer, f, h (and g if allowed) are not obtained by first
computing slope and intercept for all subjects, then doing usual
descriptive statistics ...
I understand, but however the software actually works, I should be able
to see that the objective function is minimized by simply computing it
on the output and then also on various perturbations of the output.
That would tell me WHAT the software is doing. I could worry later about
HOW it does that.
Last note: do not confuse the correlation between A and B, the random
effects in the population, given by g, and the correlation between the
estimators of the (mean) slope [uA] and the estimator of the (mean)
intercept [uB], M_A and M_B, which may be what you had in mind when
saying that A and B values were correlated << before >> (it exists in
usual linear regression). They correspond to different ideas:
g means that there is in the population a kind of constraint
between A and B ;
the correlation between the _estimators_ means that any error on
the estimation of the (mean) slope will lead to an error in the
estimation of the (mean) intercept and it is a property of the
method used to find them, not of the data/underlying world.
I had not even thought about the second of those.
But I think that is similar to one of the outputs of summary(lmer(...))
where it says correlation of fixed effects.
Hope this clarifies,
So far I don't think I've learned anything new, but I may be getting
close to describing to you what I'm missing.
More information about the R-sig-mixed-models