[R-sig-ME] Yet another question about model specification and interpretation

Sat Mar 15 18:38:06 CET 2008

On Sat, Mar 15, 2008 at 8:42 AM, Martin Eklund
<martin.eklund at farmbio.uu.se> wrote:
> Hi,
>
>  The pasted-in email below was intended for this mailing list, but I
>  accidently only sent it to Douglas Bates (very sorry!). Please see
>  below the pasted-in email for some more specifications about my
>  question.
>
>  >> I'm sorry if the following question is a wee bit stupid, but I
>  >> have to ask
>  >> to get this properly figured out.
>  >>
>  >> Continuing with the Machines data from the MEMSS package (see
>  >> below) as an
>  >> example. Suppose the models:
>  >>
>  >> (i) lmer(score ~ Machine + (Machine | Worker))
>  >> (ii) lmer(score ~ Machine + (1 | Worker) + (0 + Machine | Worker))
>  >
>  > Model (ii) cannot be estimated (or if it can, it shouldn't be).  The
>  > model that makes the most sense is
>  >
>  > lmer(score ~ Machine + (0+Machine | Worker))
>  >
>  > May I suggest that you look at the model matrices for the different
>  > models to see why?  I am really pushed for time right now and I just
>  > don't have the time to write long explanations.  Alternatively, if you
>  > were to copy this discussion to the R-SIG-Mixed-Models at R-project.org
>  > list I think there are others reading the list who could explain why.
>  >
>  >> In (i), score is modeled by the Machine fixed factor and the random
>  >> interaction between Machine and the Worker factor (which is
>  >> considered to be
>  >> random). (i) uses 9 degrees of freedom. Here the Worker effect is
>  >> confounded
>  >> within the intercept of the (Machine | Worker) term.
>  >> In (ii), score is modeled by the Machine fixed factor, the random
>  >> Worker
>  >> factor, and the random interaction between Machine and Worker.
>  >> (ii) uses 10
>  >> degrees of freedom. Here we can separate the Worker effect from the
>  >> Machine-Worker interaction, which costs us one degree of freedom.
>  >>
>  >> My question is simply: Have I interpreted this correctly? If not,
>  >> any help
>  >> on how to interpret the difference between model (i) and (ii) is
>  >> greatly
>  >> appreciated!
>  >>
>  >> Thank you!
>  >>
>  >> Best regards,
>  >>
>  >> Martin.
>
>  It is indeed possible to estimate model (ii), the output is:
>
>  Linear mixed-effects model fit by REML
>  Formula: score ~ Machine + (1 | Worker) + (0 + Machine | Worker)
>     Data: Machines
>     AIC   BIC logLik MLdeviance REMLdeviance
>   232.3 252.2 -106.2      221.4        212.3
>  Random effects:
>   Groups   Name        Variance Std.Dev. Corr
>   Worker   (Intercept) 18.40914 4.29059
>   Worker   MachineA    10.77633 3.28273
>            MachineB    34.51234 5.87472   0.299
>            MachineC     0.10678 0.32678  -0.937 -0.614
>   Residual              0.92999 0.96436
>  number of obs: 54, groups: Worker, 6; Worker, 6
>
>  Fixed effects:
>              Estimate Std. Error t value
>  (Intercept)   52.356      2.217  23.614
>  MachineB       7.967      2.393   3.329
>  MachineC      13.917      1.501   9.273
>
>  Correlation of Fixed Effects:
>           (Intr) MachnB
>  MachineB -0.167
>  MachineC -0.606  0.238
>
>  I'd like to to get the Worker effect estimated as well as the
>  interaction effects between Machine and Worker. The output above, in
>  my eyes, seems to provide exactly this. I understand that there's
>  something that I don't understand, and I'd very much appreciate any
>  help on interpreting the different models (i) and (ii) and why model
>  (ii) doesn't make sense.
>
>  The output from fitting model (i) is:
>
>  Linear mixed-effects model fit by REML
>  Formula: score ~ Machine + (Machine | Worker)
>     Data: Machines
>     AIC   BIC logLik MLdeviance REMLdeviance
>   226.3 244.2 -104.2      216.6        208.3
>  Random effects:
>   Groups   Name        Variance Std.Dev. Corr
>   Worker   (Intercept) 16.61570 4.0762
>            MachineB    34.47572 5.8716    0.483
>            MachineC    13.64510 3.6939   -0.365  0.298
>   Residual              0.92485 0.9617
>  number of obs: 54, groups: Worker, 6
>
>  Fixed effects:
>              Estimate Std. Error t value
>  (Intercept)   52.356      1.679  31.174
>  MachineB       7.967      2.418   3.294
>  MachineC      13.917      1.542   9.027
>
>  Correlation of Fixed Effects:
>           (Intr) MachnB
>  MachineB  0.461
>  MachineC -0.374  0.303
>
>  To me, it here looks as we get the interactions between Machine and
>  Worker, but that the Worker effect is confounded in the intercept of
>  the interaction effect. If we look at the BLUP:s for the models (i)
>  and (ii) we get the following:
>
>   > ranef("model (ii)")
>  An object of class "ranef.lmer"
>  [[1]]
>    (Intercept)
>  1   0.8924564
>  2  -3.8789789
>  3   4.7753576
>  4  -1.3654599
>  5   4.7893765
>  6  -5.2127517>
>
>  [[2]]
>      MachineA     MachineB    MachineC
>  1 -0.5889748   1.66535371  0.01455685
>  2  3.9941506   3.11719442 -0.39242270
>  3  2.3434019   2.91923878 -0.25246004
>  4  0.2583581   3.74188503 -0.09753681
>  5 -5.6182390  -0.07808906  0.46421605
>  6 -0.3886967 -11.36558288  0.26364665
>
>   > ranef("model (i)")
>  An object of class "ranef.lmer"
>  [[1]]
>    (Intercept)    MachineB   MachineC
>  1   0.3119782   2.2411827  0.6184094
>  2   0.1841904  -0.9876673 -4.4665295
>  3   6.9691957   0.8101625 -2.4958603
>  4  -1.0237402   3.3520102 -0.3907095
>  5  -0.8499415   5.5760567  6.1734022
>  6  -5.5916826 -10.9917448  0.5612878
>
>  I interpret this as that ranef("model (ii)")[[1]] is the BLUP for the
>  Worker factor and ranef("model (ii)")[[2]] the BLUP for the
>  interactions. as.matrix(ranef("model (ii)")[[1]]) + as.matrix(ranef
>  ("model (ii)")[[2]][1]) produces pretty much as.matrix(ranef("model
>  (i)")[[1]][1]), which is why I thought that the Worker effect in
>  model (i) is confounded in the intercept of the interaction effects.
>
>  Further, considering the model
>
>  (iii)   lmer(score ~ Machine + (0 + Machine | Worker)
>
>  we get the following output:
>
>  Linear mixed-effects model fit by REML
>  Formula: score ~ Machine + (0 + Machine | Worker)
>     Data: Machines
>     AIC   BIC logLik MLdeviance REMLdeviance
>   226.3 244.2 -104.2      216.6        208.3
>  Random effects:
>   Groups   Name     Variance Std.Dev. Corr
>   Worker   MachineA 16.64234 4.07950
>            MachineB 74.37306 8.62398  0.803
>            MachineC 19.26441 4.38912  0.623 0.771
>   Residual           0.92464 0.96158
>  number of obs: 54, groups: Worker, 6
>
>  Fixed effects:
>              Estimate Std. Error t value
>  (Intercept)   52.356      1.681  31.149
>  MachineB       7.967      2.420   3.292
>  MachineC      13.917      1.540   9.037
>
>  Correlation of Fixed Effects:
>           (Intr) MachnB
>  MachineB  0.463
>  MachineC -0.374  0.301
>
>  I (probably incorrectly) interpret this as that we don't get an
>  estimate of the Worker effect, but only the interaction between the
>  Worker and the Machine.
>
>  If anyone can tell me where I go wrong in my reasoning about the
>  models (i), (ii), and (ii), I'd be very happy and thankful.

Take a look at the deviances of the three models.  To do a safer
comparison it would be better to refit all three models with method =
"ML" before comparing values like AIC and BIC but you can notice that
the deviance (both ML and REML) and the AIC and BIC of models (ii) and
(iii) are essentially identical, which is as it should be because the
only differences are in the parameterization of the random effects.
However, model (i), which has an additional parameter compared to
models (ii) and (iii), gives higher AIC, BIC and deviances.  This is a
sign that something is going wrong in that model.

As I said in my private reply, fitting that model should fail but
apparently it doesn't.  The software converges to a spurious result
and it takes a bit of digging to discover that the result is indeed
spurious.  It would be better if the software weree to detect such
problems and indeed it is possible but difficult to do this. It is
also the case that all of the programming effort to do so and time
spent checking models for problems like this diverts from other
enhancements that could be made to the software.

As an editorial comment, those who would like to have the ability to
impose correlation structures or variance functions, in addition to
the correlations generated by the random effects, on the conditional
distribution of the response given the random effects (i.e. make lmer
more like lme in the range of models that can be fit) may want to bear
this case of getting a spurious fit to a model that is not well
defined in mind.  Surprisingly the majority of the effort in allowing
specifications like that is not in designing and implementing the data
structures and numerical algorithms.  That part is comparatively easy.
 The difficult part is deciding how the user should specify the model
and how the specification is translated into the structures and
algorithms.  You may think, "But that's easy - just use the lme
specification" but that won't work.  The nlme package was created for
S and S-PLUS then ported to R and it does not follow the set of steps
for model-fitting functions that is native to R.  Most of the time
that is not a problem but every once in a while it breaks and it
breaks in ugly ways.  Think of the recent discussion on this list
about getting fitted values and residuals that appropriately take
account of the pattern of missingness in the original data.  How does
the information on the pattern of missingness get into the model frame
in the first place?  How are the "subset", "na.action", "offset", etc.
arguments handled by model-fitting functions in R?

These are subtle and difficult issues.  Furthermore, the definition of
the models themselves must be done with care.  It is not possible to
independently specify correlations, variance structures, conditional
distributions (in the sense of models for binomial or Poisson
responses) and arbitrary random effects structures.  Certain
combinations don't make sense.  It is actually stronger than that,
certain combinations are impossible.  Witness the recent suggestion
that the lme4 package is somehow inferior because it doesn't allow
specification of "the" repeated measures model for a binary response.
Apparently "the" repeated measures model specifies a certain
conditional variance-covariance structure in the responses and the
fact that this structure is incompatible with the variances implied by
the binomial response should not prevent specification and fitting of
such a model.  Mathematical impossibility is apparently a rather
flimsy excuse for not providing software that fits such a model.

Later I'll write more on the issue of variance structures and
correlation structures.  I plan to provide classes for extensions to
the mixed models (in fact if you look at the AllClasses.R file on the
archive you will see they are already there) and show how they could
be used to fit certain simple extensions, like a model with stratified
conditional variances.  Then a person who wants to create software to
do a "one off" fit of a mixed model  including various "mix-in"
specifications can do so.  However, I don't plan on offering general
model specifications in lmer until I, or someone else, can work out
end-to-end how to do it without painting yourself into a corner where
impossible model specifications can be fit.  And also I want it to be
done in an Rish way so that all the other model arguments work as they
should.  (The main stumbling block there is that a call to model.frame
allows for only 1 formula and you must be very careful how you create
that formula.  If you create an omnibus formula from a collection of
other formulas you have to decide what the environment of the omnibus
formula should be.)