[R-sig-ME] Missing values in lmer vs. HLM

Mon Jul 6 03:24:09 CEST 2015

There has been a reasonable amount of work on comparing different methods
for random-effects meta-analysis. Using REML does produce better estimates
of the effect size, in terms of coverage than maximum likelihood. Using the
profile likelihood produces better results again, so maybe that is what
should be used. On the other hand it is likely that the distribution of the
random effects isn't normal, so it probably isn't important.

If it is important there are now more general ways of fixing the bias and
coverage, parametric bootstrap should work nicely, so it doesn't seem
useful to use a technique that only has limited application. Maximum
likelihood is applied to mixed effects logistic where it has much the same
problems, and everyone just seems to ignore them.

On 6 July 2015 at 10:31, John Maindonald <john.maindonald at anu.edu.au> wrote:

> Quoting Douglas:
> > . . .  In mathematical statistics you say, "assuming that
> > the model is correct, these are the consequences" and that is all there
> is
> > to it.  The way that the game is actually played is that, when you get to
> > the end of the proof and discover that you need some conditions to make
> it
> > work, you go back to the beginning and add those conditions.
>
> So mathematical statistics is a ‘game’.  That is surely a rather damning
> comment!
> It does however raise important points.  My perception is that the
> situation has
> improved greatly from the “that is all there is to it” typical stance in
> the published
> literature of the 1960s and 1970s.  There’s greater pressure to back up
> theoretical
> development with computations with (often, somewhat) real data.
>
> The situation is though uneven.   In some parts of the literature though
> (the literature
> on smoothing seems to me particularly rife with this problem) serious
> issues with the
> unreality of iid or at least id (independence) assumptions, for time
> series and/or
> spatial data, are just ignored!  That is just one example.  [For
> interpolation, maybe
> the iid assumption often makes reasonable sense for spatial data.]
>
> More important than whether the estimator has likelihood in its name, or
> whether it
> is misleading to call it some kind of likelihood estimator, is whether it
> serves the
> intended purpose.  Use the median for sure where it makes sense, which
> incidentally
> is neither unbiased nor ML.  I do not think that one would get away with
> quoting
> maximum likelihood estimators (eg, for wages and wage differentials in
> various
> sectors) in a set of national account figures.  Here, one might be tempted
> to make
> politically fraught comments about the malign effects of highly skew
> distributions!
> Nuf sed.
>
> In many contexts, it is thought important to have numbers that add up.
> That, after
> all is what analysis of variance breakdowns of the mean are all about.
> Medians do
> not work well in that context; they do not give a table that adds up.  But
> of course,
> just because one is presenting a table where effects add up to give a
> grand mean,
> distributions that are close to symmetric are important, for conceptual as
> well as for
> theoretical (normality of sampling distribution) reasons.  This all
> becomes a whole
> lot more fraught for GLM models.
>
> The main benefit of REML may be that it matches what comes out
> computationally
> to the theory.  Does this do serious damage to the numbers that come out,
> relative
> to the way that they will be used?  Doug, do your strictures apply also to
> the t-statistic,
> which is a REML type statistic?  (One uses an unbiased estimate of the
> variance in
> the denominator.)  Or is the issue that it is just a means to an end?
>
> On moment estimators, comments made by Tukey seem to me relevant:
> "Do not assume “that we always know what in fact we never know – the exact
> probability structure . . .
> No data set is large enough to tell us how it should be analysed.”
> [Tukey: More honest foundations for data analysis. Journal of Statistical
> Planning and Inference,
> vol. 57, no. 1, pp. 21-28, 1997]
>
> Nor, I want to add, do we commonly have all the needed background
> information.
>
> Moment estimators can be a way to get an estimator that applies to a wide
> class of distributions.
> I and many others think this more than sufficient justification for the
> dispersion estimate that is
> widely used in quasi-likelihood computations, notable to GLM models with
> quasi-Poisson or
> quasi-binomial distributions.  A key question is of course whether the
> dispersion might vary
> with the mean.  And yes, this does make a whole lot more sense if one is
> working on a scale
> where the sampling distribution(s) is(are) symmetric.  So perhaps what is
> wrong with standard
> quasi models is that they inflate the variance on a scale where
> distributions are nothing like
> symmetric! What is totally wrong is any failure to adjust for an inflated
> variance in cases (in
> most areas, e.g., ecology, the great majority) where the variance clearly
> is inflated relative to
> the Poisson or binomial.  Note that this applies also to glmer models,
> notably where Poisson
> or binomial errors are specified.  One can specify an observation level
> random effect. I suspect
> that results are often compromised because of failure to attend to this
> issue.
>
> In summary, there are some very important issues here, but I do not see
> that substituting one
>  mathematical simplification for another is an answer.  In the end, we
> want our models to be
> useful, useful I would hope for more than purposes of getting promotion!
>
> John Maindonald             email: john.maindonald at anu.edu.au
>
> > On 6/07/2015, at 01:21, Douglas Bates <bates at stat.wisc.edu> wrote:
> >
> > My apologies for making such a statement then not following up.  As has
> > been mentioned, this is a holiday weekend in the U.S.
> >
> > The section that Landon quoted does get at the point of my comment.
> >
> > The usual justification for REML is that REML estimators of variance
> > components are less biased than are the maximum likelihood estimators
> > (mle).  On the surface this seems to be a convincing argument, for who
> > would want to use a "biased" estimator?
> >
> > But why should we be concerned with the estimator of the variance?  Why
> not
> > the estimator of the standard deviation, or the logarithm of the standard
> > deviation?  The distribution of variance estimators are highly skewed in
> > most cases.  Consider the simplest case of estimating the variance from
> an
> > i.i.d. sample from a Gaussian distribution.  The distribution of the
> > estimator is a Chi-squared distribution, which is highly skewed.  The
> > distribution of the estimator of σ is less skewed.  The distribution of
> the
> > estimator of log(σ) is more-or-less symmetric.
> >
> > The important point here is that "bias" relates to the expected value of
> > the estimator.  The argument for REML is based on the expected value of a
> > quantity with a highly skewed distribution, but we know that this is a
> poor
> > measure of location for such a distribution.  That's why it is more
> > informative to consider median salaries instead of average salaries.  The
> > fact that the average wealth of members of LeBron James's high school
> > basketball team is very high doesn't make them all rich.
> >
> > Mle's have an invariance property in that the mle of σ is the square root
> > of the mle of σ²; the mle of log(σ²) is the logarithm of the mle of σ²,
> > etc.  Unbiased estimators aren't invariant under transformation.  The
> > square root of an unbiased estimator of σ² is not an unbiased estimator
> of
> > σ.
> >
> > If an unbiased estimator were so important then we should probably
> consider
> > the estimate of log(σ²), not σ² itself.  The reason for our being fixated
> > on σ² is more computational than practical.  When using hand calculations
> > it is easiest to estimate σ² then derive an estimate of σ from that.
> These
> > considerations are less convincing when using computers.
>
>
>
>
>
> > In summary, the case for REML is less convincing than it seems at first
> > glance.  It is a consequence of a certain type of mathematical
> exposition,
> > where your assumptions are never questioned.  You only care about going
> > from "if" to "then".  In mathematical statistics you say, "assuming that
> > the model is correct, these are the consequences" and that is all there
> is
> > to it.  The way that the game is actually played is that, when you get to
> > the end of the proof and discover that you need some conditions to make
> it
> > work, you go back to the beginning and add those conditions.  It helps if
> > you call this case the "regular" case or the "normal" case or some other
> > word with favorable connotations.
> >
> > So if you want to characterize the "best" estimator you do it by peeling
> > off properties related to the first moment, the second moment, etc. For
> the
> > first moment you say that the expected value of the estimator must be
> equal
> > to the parameter being estimated and you call that the "unbiased" case.
> > Technically this is just a mathematical property but the connotation of
> the
> > word gives it much more heft than the mathematical property.  In
> > mathematical statistics it is irrelevant to question why it is this
> > particular estimator or this particular scale that is of interest - the
> > only objective is to prove the theorem and publish the result.
> >
> > (The folklore in our department is that George Box's famous statement
> about
> > "all models are wrong" originated in a thesis defense where the candidate
> > began by stating that "Assuming that the model is correct" and George
> > interrupted to say "But all models are wrong".  It wasn't a good day for
> > the candidate.  I'm sorry to say that I don't know if this story is
> > accurate as I never took the opportunity to ask him.)
> >
> > On Sat, Jul 4, 2015 at 11:36 PM landon hurley <ljrhurley at gmail.com>
> wrote:
> >
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA512
> >>
> >> On 07/05/2015 12:14 AM, Phillip Alday wrote:
> >>> On Sat, 2015-07-04 at 21:21 +0200, Karl Ove Hufthammer wrote:
> >>>> Den 04. juli 2015 18:18, Douglas Bates skreiv:
> >>>>> Having said all this I will admit that the original sin, the
> >>>>> "REML" criterion, was committed by statisticians.  In retrospect
> >>>>> I wish that we had not incorporated that criterion into the nlme
> >>>>> and lme4 packages but, at the time we wrote them, our work would
> >>>>> have been dismissed as wrong if our answers did not agree with
> >>>>> SAS PROC MIXED, etc.  So we opted for bug-for-bug compatibility
> >>>>> with existing software.
> >>>>
> >>>> Hm. I find this statement surprising. I was under the impression
> >>>> REML is *preferred* to ML in many situations (e.g. in simple
> >>>> random intercept models with few observations for each random
> >>>> intercept), and that *ML estimation* may result in severe bias. Do
> >>>> you consider maximising the REML criterion as a bug?
> >>>>
> >>>
> >>> This was my question as well. My understanding was that REML, like
> >>> Bessel's correction for the sample variance, was motivated by bias in
> >>> the maximum-likelihood estimator for small numbers of observations.
> >>> The corrected estimator is in both cases no longer the MLE, so that
> >>> the ML part is bit of a misnomer, but if you take "residualized"
> >>> expansion of RE instead of "restricted", then REML seems more like a
> >>> function of ML and not a "type" of ML.
> >>>
> >>> IIRC, the default in MixedModels.jl is now ML -- have you changed
> >>> your opinion about the utility of REML? Is there some type of weird
> >>> paradoxical situation with REML like with Bessel's correction -- the
> >>> variance estimates are no longer biased, but the s.d. estimates
> >>> are?
> >>>
> >>>
> >>> Or is the original sin the use of the name REML when REML is no
> >>> longer *the* maximum likelihood?
> >>>
> >>
> >> I had assumed that he would have responded by now, but it is a holiday
> >> in the US. The position Bates is taking is explained (I think) in his
> >> 2010 report
> >> lme4: Mixed effects modelling with R in Section 5.5 `The REML
> >> Criterion', roughly page 123-124 in the pdf [0]. It's a short read, but
> >> the most relevant bit I think is:
> >>
> >>> The argument for preferring σ_R to σ_L as an estimate of σ**2 is
> >>> that the numerator in both estimates is the sum of squared
> >>> residuals at β and, although the residual vector, yobs − Xβ , is an
> >>> n-dimensional vector, the residual at θ satisfies p linearly
> >>> independent constraints, X**{T} (yobs − Xβ ) = 0. That is, the residual
> >>> at θ is the projection of the observed response vector, yobs , into
> >>> an (n − p)-dimensional linear subspace of the n-dimensional response
> >>> space. The estimate σR takes into account the fact that σ**2 is
> >>> estimated from residuals that have only n − p degrees of freedom.
> >>>
> >>> Another argument often put forward for REML estimation is that σ_R is
> >>> an unbiased estimate of σ**2 , in the sense that the expected value of
> >>> the estimator is equal to the value of the parameter. However,
> >>> determining the expected value of an estimator involves integrating
> >>> with respect to the density of the estimator and we have seen that
> >>> densities of estimators of variances will be skewed, often highly
> >>> skewed. It is not clear why we should be interested in the expected
> >>> value of a highly skewed estimator. If we were to transform to a
> >>> more symmetric scale, such as the estimator of the standard deviation
> >>> or the estimator of the logarithm of the standard deviation, the
> >>> REML estimator would no longer be unbiased. Furthermore, this
> >>> property of unbiasedness of variance estimators does not generalize
> >>> from the linear regression model to linear mixed models. This is all
> >>> to say that the distinction between REML and ML estimates of
> >>> variances and variance components is probably less important that
> >>> many people believe.
> >>
> >>
> >> best,
> >>
> >> landon
> >> [0]:
> >>
> >>
> www.researchgate.net/publictopics.PublicPostFileLoader.html?id=53326f19d5a3f206348b45af&key=6a85e53326f199010f
> >>
> >>> Best, Phillip Alday _______________________________________________
> >>> R-sig-mixed-models at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>
> >>
> >>
> >> - --
> >> Violence is the last refuge of the incompetent.
> >> -----BEGIN PGP SIGNATURE-----
> >> Version: GnuPG v2.0.17 (GNU/Linux)
> >>
> >> iQIcBAEBCgAGBQJVmLQMAAoJEDeph/0fVJWs21AP/RPMNrQDdkd67eQWfb9jhbTZ
> >> VIfZDNAt088n8XednUzk6BQlVUirb8akqdVq8YqtdaCotdP5dxXjRr30hO72FeRj
> >> rvLWcZXtDVuNwOFZA44Aw2YplwD1sU+G+vMLOsPD4BrBRfByY5FkkX2lTliQjbVK
> >> eYNi57977w/AE7y48OTprwBdkNkWjTQTrnKAsglBtOlnC+x8TdD8J0WfaZCfI1CP
> >> iUN6298pdSNWm+vAWaFCeq6Wig8o2kYGONd1RBDzGidbcy5CiQebuJYZ8cU5zl4V
> >> X34alL6cX+9qDLWbi4jYz1/3lG1U4NsCKhs6fM7imxOFV9XXtsZTr16xmHbkc6B+
> >> daNAbln/SHKoCpuvGiO+IL/H8y8W1DLyJk7sRlb7tOuDHViBCOPLwPZj71aJFFab
> >> qY40JvxdV0rputOeg5OtTyqROxCwtgqKWDaGcli1Dpcrca2aE7qNpPdMjxmnJN+j
> >> RDaCkHflJuwHifkupg7qSfLa2zbBf4KirfGnOsoeicZPF4s7BmDLU28fMlhAqEly
> >> Dbg3niPaW4/X3X69yrkw5XG46uftRGlSUdl/eMWWBUVy3o5oAkAgxpQl/49Md0CX
> >> SyYyv+Iaa9NPtgI828NYfw59xsW98hTnNNtG+D0V4nN/1d29M/KM6LspxV9Nu64b
> >> XNno0S7U16mSu1KhvIwY
> >> =7uyi
> >> -----END PGP SIGNATURE-----
> >>
> >> _______________________________________________
> >> R-sig-mixed-models at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>
> >>
> >
> >
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

-- 

*Ken Beath*
Lecturer
Statistics Department
MACQUARIE UNIVERSITY NSW 2109, Australia

Phone: +61 (0)2 9850 8516

Level 2, AHH
http://stat.mq.edu.au/our_staff/staff_-_alphabetical/staff/beath,_ken/

CRICOS Provider No 00002J
This message is intended for the addressee named and may...{{dropped:9}}