[R-sig-ME] lsmeans

Mon Jun 25 10:42:05 CEST 2007

As a statistician, I need to be careful to fit a good model to data.
"Good" can have varied meanings from "it didn't crash" to "it
completed in a tolerable amount of time" to "the variance parameters
look like what I was expecting to see" to many other criteria.
Modeling the data well can be very time-consuming, especially for
complicated data, and almost always involves finding a balance between
writing down a theoretical model and finding a model that is useful,
meaningful, and calculable.

My clients, however, don't care a whit about models.  They care about
making decisions, and the second part of my job after fitting a model
is to provide information to facilitate decision making.  The client
might say, " I have three treatments.  I don't care about significant
differences.  I have to pick one of the three, which one is my best
choice?"  Often that means providing some type of overall estimate of
the treatment effects and standard errors.  There is a nice paper that
discusses some of the art of making predictions:

@Article{Welham2004,
  author =       {Sue Welham and Brian Cullis and Beverley Gogel and Arthur
                  Gilmour and Robin Thompson},
  title =        {Predictions in Linear Mixed Models},
  journal =      {Aust. N. Z. J. Stat.},
  year =         2004,
  pages =        {325--347}
}

I called it an art, because as Doug mentioned, you have to be very
careful about what you mean by an "lsmean" or any other type of
prediction.  For example, here are three ways (not necessarily for
identical models) to calculate one particular estimate:

In PROC MIXED:
estimate "Trt1 BLUP narrow" intercept 3 |trt 3 0 0 0 0
                    loc*trt 1 0 0 0 0 1 0 0 0 0 1 0/divisor=3;

In ASREML:
predict trt  !average loc*trt  # Narrow inference space predictions

In lme
coefs=unlist(ranef(m1=lme(y ~ trt, random=~loc))
coefs[1:5]+c(mean(coefs[c(6,11,16)]),
                mean(coefs[c(7,12,17)]),
                mean(coefs[c(8,13,18)]),
                mean(coefs[c(9,14,19)]),
                mean(coefs[c(10,15,20)])) + mean(dat$y)

These three methods have pros and cons.  Extracting the right
coefficients in MIXED and lme can be excruciatingly tedious (even for
simple models), but a good rite of passage to force you to think about
what you are doing.  After learning that, ASREML provides a cleaner
method (as does PROC GLIMMIX, I believe).

It behooves decision makers to remember that the the theoretical and
computational aspects of mixed models are quite complex.  It behooves
the software developers to help people fit models and then help them
make decisions.

K Wright

On 6/23/07, Douglas Bates <bates at stat.wisc.edu> wrote:
> On 6/23/07, Martin Henry H. Stevens <HStevens at muohio.edu> wrote:
> > Dear Frank,
> > As lmer doesn't use least squares means, I am not sure they are
> > readily obtained. Predicted values may be obtained in several ways,
> > including the following from Spencer Graves:
>
> > ### From Spencer Graves
> > ### For getting the predicted values from the fixed effects only.
> > predict.lmer <- function(object, X){
> > ## From: Spencer Graves <spencer.graves>
> > ## Date: Sun, 07 May 2006 08:07:07 -0700
> > # object has class "lmer"
> > # X = model matrix with columns
> > # matching object at X
> >     if(missing(X))
> >       X <- object @ X
> >     b <- fixef(object)
> >     X %*% b
> > }
>
> That works for class "lmer".  At the risk of confusing everyone I will
> say that it doesn't work for the current "lmer2" class but it will in
> the near future.
>
> It is a dangerous practice to reach into an object and grab the
> contents of slots (i.e. using the @ operator on S4 objects) or
> components (using $ on S3 objects).  I continue to tweak the form of
> the lmer and lmer2 objects seeking to optimize the range of
> applicability, the performance and the storage requirements of the
> algorithms.  Sometimes certain parts of the structure are represented
> in one way and sometimes another.
>
> The use of extractor functions such as fixef is much preferred.  I am
> responsible for ensuring that the code is consistent and that such
> extractor do the right thing for the current structure.
> I will ensure that the model.matrix extractor works with upcoming
> releases of the lme4 package and that it allows an optional argument
> "which" to specify "fixed", "random" or "both".  I will also allow
> such an argument for the "fitted" extractor.
>
> > On Jun 22, 2007, at 6:21 PM, Cougar wrote:
>
> > > I was wondering if someone could explain how to obtain LSmeans from
> > > lmer.  I
> > > could not find any information in the R-archives.
>
> Thanks for checking the archive before asking, Frank.  I would
> suggest, not entirely facetiously, that you also check
>
> library(fortunes); fortune("lsmeans")
>
> If you could explain what you want to calculate from a fitted model,
> without using the term "lsmeans", I, at least, would be better able to
> give advice on how to calculate it.  My big problem with lsmeans is
> that I have never been able to understand how they should be
> calculated and, more importantly, why one should want to calculate
> them.  In other words, what do lsmeans represent and why should I be
> interested in these particular values?
>
> I have tried to understand what they are.  I have read the
> explanations and looked at the formulas in, e.g. "SAS System for Mixed
> Models".  Anyone who can understand that explanation and decide how
> the corresponding quantities should be formulated in the
> representation used in lmer is welcome to explain it to me.
>
> It may seem surprising that I say "how the corresponding quantities
> are calculated".  Many people believe that the concept corresponds to
> a formula and all one needs to do is to "code up the formula".  That's
> not the case.
>
> Pardon me while I digress a bit.
>
> The statistical concept represented by parameter estimates must be
> associated with some kind of an objective function, such as "sum of
> squares of residuals" or "the log-likelihood", and there are many
> routes that the person designing the algorithm can take when
> optimizing the objective function to obtain the parameter estimates.
> This also applies to the representation of the model as data
> structures.  The data structures used in lmer are very different from
> those used by other software for fitting mixed models and they did not
> come about by chance.  They are the result of intensive study and
> extensive experimentation.  I can (and do) explain what the structures
> are and how they are used but if you say, e.g. "evaluate this summary
> quantity using G-inverse" that doesn't work for me because I don't
> have a G matrix.  I need to go back to an explanation of what does it
> represent before I can decide how to evaluate it.
>
> This doesn't mean that everyone gets to make up their own estimation
> criterion.  Consistency between implementations is a matter of
> agreeing on the objective function and checking that it is evaluated
> consistently.  In other words lmer should agree with SAS PROC MIXED
> and HLM and MLWin and ... on what the log-likelihood or the REML
> criterion is for a given model, data, parameter-value combination.
> However the different implementations do not need to agree on how they
> represent the information from the model and the data nor on how they
> optimize the objective.
>
> I'm sorry for such a long-winded response to your simple question.  As
> you might guess I have been thinking a lot about the general issues
> because I am writing the "Theory and computational methods for mixed
> models" document that explains exactly how the log-likelihood in lmer,
> and the Laplace approximation to the log-likelihood in nlmer and
> glmer, are defined and evaluated.  One of the interesting things about
> the description is that it does not deal with how the log-likelihood
> or its approximation is to be optimized.  Not so much for linear mixed
> models but definitely for nonlinear mixed models and generalized
> linear mixed models, people confuse the algorithm and the objective.
> I have been participating in a discussion (http://www.nlme.org/) of
> some draft specifications for software to fit nonlinear mixed models
> to population pharmacokinetic and pharmacodynamic data.  The draft
> specifications state that the software should implement a variety of
> methods that go by names like FO (first-order), FOCE (first-order
> conditional estimates), ...  I claim that you can't tell if you have
> such a method implemented properly because those methods (even the
> Lindstrom-Bates method, which I know well) just tell you how you go
> from one set of parameter estimates to the next.  Different
> implementations, or even the same implementation but using different
> hardware and systems software, may produce different answers and who
> can say which is the best answer.  I advocate agreeing on a "gold
> standard" evaluation of the likelihood which can be evaluated based
> solely on the model and the data and parameter values.  That is, it
> does not depend on how you got to those parameter values.  Then you
> can compare different sets of estimates.
>
> The situation for generalized linear mixed models is similar.  People
> speak of PQL or PQL1 or PQL2 estimates as if these names could
> meaningfully be applied to estimates.  These names refer to methods of
> evaluating an iterative step, not an objective.  (For generalized
> linear models without random effects you may say that people refer to
> IRLS estimates and that just describes an iterative step.  However,
> one can demonstrate that a fixed point of the IRLS iterations is the
> maximum likelihood estimate so IRLS is justified on the basis of
> optimizing an objective function, which can be, and is, evaluated at
> the converged values.)
>
> Returning to lsmeans, as best I can understand it, these are an
> attempt to characterize a "typical" fitted response for a level of a
> factor in the model, even when other terms in the model represent
> interactions with that factor.  If that is the case, and let me
> reiterate that I don't claim to understand exactly what they are
> supposed to be, then I would claim that you need to be awfully damned
> careful about what you do regarding those interactions.  There is no
> universally applicable way of defining such a beast.  You must
> carefully state how you are going to marginalize over the levels of
> other factors involved in the interactions, which, of course, implies
> that you first need to think about the marginalization and decide how
> it should be done in your particular situation.
>
> Sorry for going on at such length.  Please don't interpret the length
> of my response as a criticism of your question.  I realize that those
> applying statistical methods in their research to be published in
> their field's literature need things like p-values and lmeans and so
> on, to satisfy editors and referees in their field who believe that
> these values are perfectly well-defined.  Unfortunately, I'm a purist
> with extensive training in mathematics  and I only put things in my
> software when I am satisfied that they are meaningful and useful and
> evaluated the best way I know or can derive.  This is one of the great
> pleasures of working on Open Source software.  The R Project doesn't
> have a marketing department or senior management to tell us, "It
> doesn't matter what you think, customers claim they need Type III sums
> of squares so you have to add them whether you like it or not".  To
> paraphrase Bob Dylan,  "When you ain't paid nothing, you got nothing
> to lose."
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>