[R-sig-ME] p-values vs likelihood ratios

Jarrod Hadfield j.hadfield at ed.ac.uk
Mon Feb 21 14:39:52 CET 2011


Also, comparing REML likelihoods between models where the fixed  
effects have changed is, I think, inappropriate because the data have  
essentially changed. You should be OK if you specify REML=FALSE.



On 21 Feb 2011, at 13:24, Ben Bolker wrote:

> Hash: SHA1
> On 11-02-21 12:09 AM, Mike Lawrence wrote:
>> Hi folks,
>> I've noticed numerous posts here that discuss the appropriateness of
>> p-values obtained by one method or another in the context of mixed
>> effects modelling. Following these discussions, I have an observation
>> (mini-rant) then a question.
>> First the observation:
>> I am not well versed in the underlying mathematical mechanics of  
>> mixed
>> effects modelling, but I would like to suggest that the apparent
>> difficulty of determining appropriate p-values itself may be a sign
>> that something is wrong with the whole idea of using mixed effects
>> modelling as a means of implementing a null-hypothesis testing
>> approach to data analysis. That is, despite the tradition-based  
>> fetish
>> for p-values generally encountered in the peer-review process, null
>> hypothesis significance testing itself is inappropriate for most  
>> cases
>> of data analysis. p-values are for politicians; they help inform
>> one-off decisions by fixing the rate at which one specific type of
>> decision error will occur (notably ignoring other types of decision
>> errors). Science on the other hand is a cumulative process that is
>> harmed by dichotmized and incomplete representation of the data as
>> null-rejected/fail-to-reject-the-null. Data analysis in science  
>> should
>> be about quantifying and comparing evidence between models of the
>> process that generated the data. My impression is that the likelihood
>> ratio (n.b. not likelihood ratio *test*) is an easily computed
>> quantity that facilitates quantitative representation of such
>> comparison of evidence.
>  Yes, although I don't personally think there's anything fundamentally
> wrong with p values when used properly (I know Royall (1993) states  
> that
> even in the Fisherian 'strength of evidence' framework they are  
> flawed ...)
>> Now the question:
>> Am I being naive in thinking that there are no nuances to the
>> computation of likelihood ratios and appropriateness of their
>> interpretation in the mixed effects modelling context? To provide
>> fodder for criticism, here are a few ways in which I imagine  
>> computing
>> then interpreting likelihood ratios:
>> Evaluation of evidence for or against a fixed effect:
>> m0 = lmer( dv ~ (1|rand) + 1 )
>> m1 = lmer( dv ~ (1|rand) + iv )
>> AIC(m0)-AIC(m1)
>> Evaluation of evidence for or against an interaction between two  
>> fixed effects:
>> m0 = lmer( dv ~ (1|rand) + iv1 + iv2 )
>> m1 = lmer( dv ~ (1|rand) + iv1 + iv2 + iv1:iv2 )
>> AIC(m0)-AIC(m1)
>> Evaluation of evidence for or against a random effect:
>> m0 = lmer( dv ~ (1|rand1) + 1 )
>> m1 = lmer( dv ~ (1|rand1) + (1|rand2) + 1 )
>> AIC(m0)-AIC(m1)
>> Evaluation of evidence for or against correlation between the
>> intercept and slope of a fixed effect that is allowed to vary within
>> levels of the random effect:
>> m0 = lmer( dv ~ (1+iv|rand) + iv )
>> m1 = lmer( dv ~ (1|rand) + (0+iv|rand) + iv )
>> AIC(m0)-AIC(m1)
>> Certainly I've already encountered uncertainty in this approach in
>> that I'm unsure whether AIC() or BIC() is more appropriate for
>> correcting the likelihood estimates to account for the differential
>> complexity of the models involved in these types of comparisons. I  
>> get
>> the impression that both corrections were developed in the context of
>> exploratory research where model selection involves many models
>> involving multiple usually observed variables (vs manipulated), so I
>> don't have a good understanding of how their different
>> derivations/intentions apply to this simpler context of comparing two
>> nested models to determine evidence for a specific effect of  
>> interest.
>> I would greatly appreciate any thoughts on this AIC/BIC issue, or any
>> other complexities that I've overlooked in my proscription to abandon
>> p-values in favor of the likelihood ratio (at least, for all
>> non-decision-making scientific applications of data analysis).
>  I don't see why you're using AIC differences here.  If you want to
> test hypotheses, you should use the likelihood ratio (with or without
> ascribing a p-value to it)!  The AIC was designed to estimate the
> expected predictive accuracy of a model on out-of-sample data (as
> measured by the Kullback-Leibler distance); the BIC is designed to
> approximate the probability that a model is the 'true' model.  AIC  
> is a
> shortcut that is strongly favored by ecologists (among others) because
> it is easy, but it does not do what they are usually trying to do and
> what I see you trying to do above, i.e. test for evidence of an  
> effect.
>   If one is really trying to test for "evidence of an effect" I see
> nothing wrong with a p-value stated on the basis of the null
> distribution of deviance differences between a full and a reduced  
> model
> - -- it's figuring out that distribution that is the hard part. If I  
> were
> doing this in a Bayesian framework I would look at the credible  
> interval
> of the parameters (although doing this for multi-parameter effects is
> harder, which is why some MCMC-based "p values" have been concocted on
> this list and elsewhere).
>  Ben Bolker
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> SCgAmwW+Fa9d5J8ht29gob+3jA1T/60s
> =ZRY5
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

More information about the R-sig-mixed-models mailing list