[R-sig-ME] p-values vs likelihood ratios

Ben Bolker bbolker at gmail.com
Mon Feb 21 14:24:54 CET 2011

Hash: SHA1

On 11-02-21 12:09 AM, Mike Lawrence wrote:
> Hi folks,
> I've noticed numerous posts here that discuss the appropriateness of
> p-values obtained by one method or another in the context of mixed
> effects modelling. Following these discussions, I have an observation
> (mini-rant) then a question.
> First the observation:
> I am not well versed in the underlying mathematical mechanics of mixed
> effects modelling, but I would like to suggest that the apparent
> difficulty of determining appropriate p-values itself may be a sign
> that something is wrong with the whole idea of using mixed effects
> modelling as a means of implementing a null-hypothesis testing
> approach to data analysis. That is, despite the tradition-based fetish
> for p-values generally encountered in the peer-review process, null
> hypothesis significance testing itself is inappropriate for most cases
> of data analysis. p-values are for politicians; they help inform
> one-off decisions by fixing the rate at which one specific type of
> decision error will occur (notably ignoring other types of decision
> errors). Science on the other hand is a cumulative process that is
> harmed by dichotmized and incomplete representation of the data as
> null-rejected/fail-to-reject-the-null. Data analysis in science should
> be about quantifying and comparing evidence between models of the
> process that generated the data. My impression is that the likelihood
> ratio (n.b. not likelihood ratio *test*) is an easily computed
> quantity that facilitates quantitative representation of such
> comparison of evidence.

  Yes, although I don't personally think there's anything fundamentally
wrong with p values when used properly (I know Royall (1993) states that
even in the Fisherian 'strength of evidence' framework they are flawed ...)

> Now the question:
> Am I being naive in thinking that there are no nuances to the
> computation of likelihood ratios and appropriateness of their
> interpretation in the mixed effects modelling context? To provide
> fodder for criticism, here are a few ways in which I imagine computing
> then interpreting likelihood ratios:
> Evaluation of evidence for or against a fixed effect:
> m0 = lmer( dv ~ (1|rand) + 1 )
> m1 = lmer( dv ~ (1|rand) + iv )
> AIC(m0)-AIC(m1)
> Evaluation of evidence for or against an interaction between two fixed effects:
> m0 = lmer( dv ~ (1|rand) + iv1 + iv2 )
> m1 = lmer( dv ~ (1|rand) + iv1 + iv2 + iv1:iv2 )
> AIC(m0)-AIC(m1)
> Evaluation of evidence for or against a random effect:
> m0 = lmer( dv ~ (1|rand1) + 1 )
> m1 = lmer( dv ~ (1|rand1) + (1|rand2) + 1 )
> AIC(m0)-AIC(m1)
> Evaluation of evidence for or against correlation between the
> intercept and slope of a fixed effect that is allowed to vary within
> levels of the random effect:
> m0 = lmer( dv ~ (1+iv|rand) + iv )
> m1 = lmer( dv ~ (1|rand) + (0+iv|rand) + iv )
> AIC(m0)-AIC(m1)
> Certainly I've already encountered uncertainty in this approach in
> that I'm unsure whether AIC() or BIC() is more appropriate for
> correcting the likelihood estimates to account for the differential
> complexity of the models involved in these types of comparisons. I get
> the impression that both corrections were developed in the context of
> exploratory research where model selection involves many models
> involving multiple usually observed variables (vs manipulated), so I
> don't have a good understanding of how their different
> derivations/intentions apply to this simpler context of comparing two
> nested models to determine evidence for a specific effect of interest.
> I would greatly appreciate any thoughts on this AIC/BIC issue, or any
> other complexities that I've overlooked in my proscription to abandon
> p-values in favor of the likelihood ratio (at least, for all
> non-decision-making scientific applications of data analysis).

  I don't see why you're using AIC differences here.  If you want to
test hypotheses, you should use the likelihood ratio (with or without
ascribing a p-value to it)!  The AIC was designed to estimate the
expected predictive accuracy of a model on out-of-sample data (as
measured by the Kullback-Leibler distance); the BIC is designed to
approximate the probability that a model is the 'true' model.  AIC is a
shortcut that is strongly favored by ecologists (among others) because
it is easy, but it does not do what they are usually trying to do and
what I see you trying to do above, i.e. test for evidence of an effect.
   If one is really trying to test for "evidence of an effect" I see
nothing wrong with a p-value stated on the basis of the null
distribution of deviance differences between a full and a reduced model
- -- it's figuring out that distribution that is the hard part. If I were
doing this in a Bayesian framework I would look at the credible interval
of the parameters (although doing this for multi-parameter effects is
harder, which is why some MCMC-based "p values" have been concocted on
this list and elsewhere).

  Ben Bolker

Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/


More information about the R-sig-mixed-models mailing list