[R-sig-ME] p-values vs likelihood ratios

Mike Lawrence Mike.Lawrence at dal.ca
Mon Feb 21 06:09:27 CET 2011

```Hi folks,

I've noticed numerous posts here that discuss the appropriateness of
p-values obtained by one method or another in the context of mixed
effects modelling. Following these discussions, I have an observation
(mini-rant) then a question.

First the observation:

I am not well versed in the underlying mathematical mechanics of mixed
effects modelling, but I would like to suggest that the apparent
difficulty of determining appropriate p-values itself may be a sign
that something is wrong with the whole idea of using mixed effects
modelling as a means of implementing a null-hypothesis testing
approach to data analysis. That is, despite the tradition-based fetish
for p-values generally encountered in the peer-review process, null
hypothesis significance testing itself is inappropriate for most cases
of data analysis. p-values are for politicians; they help inform
one-off decisions by fixing the rate at which one specific type of
decision error will occur (notably ignoring other types of decision
errors). Science on the other hand is a cumulative process that is
harmed by dichotmized and incomplete representation of the data as
null-rejected/fail-to-reject-the-null. Data analysis in science should
be about quantifying and comparing evidence between models of the
process that generated the data. My impression is that the likelihood
ratio (n.b. not likelihood ratio *test*) is an easily computed
quantity that facilitates quantitative representation of such
comparison of evidence.

Now the question:

Am I being naive in thinking that there are no nuances to the
computation of likelihood ratios and appropriateness of their
interpretation in the mixed effects modelling context? To provide
fodder for criticism, here are a few ways in which I imagine computing
then interpreting likelihood ratios:

Evaluation of evidence for or against a fixed effect:
m0 = lmer( dv ~ (1|rand) + 1 )
m1 = lmer( dv ~ (1|rand) + iv )
AIC(m0)-AIC(m1)

Evaluation of evidence for or against an interaction between two fixed effects:
m0 = lmer( dv ~ (1|rand) + iv1 + iv2 )
m1 = lmer( dv ~ (1|rand) + iv1 + iv2 + iv1:iv2 )
AIC(m0)-AIC(m1)

Evaluation of evidence for or against a random effect:
m0 = lmer( dv ~ (1|rand1) + 1 )
m1 = lmer( dv ~ (1|rand1) + (1|rand2) + 1 )
AIC(m0)-AIC(m1)

Evaluation of evidence for or against correlation between the
intercept and slope of a fixed effect that is allowed to vary within
levels of the random effect:
m0 = lmer( dv ~ (1+iv|rand) + iv )
m1 = lmer( dv ~ (1|rand) + (0+iv|rand) + iv )
AIC(m0)-AIC(m1)

Certainly I've already encountered uncertainty in this approach in
that I'm unsure whether AIC() or BIC() is more appropriate for
correcting the likelihood estimates to account for the differential
complexity of the models involved in these types of comparisons. I get
the impression that both corrections were developed in the context of
exploratory research where model selection involves many models
involving multiple usually observed variables (vs manipulated), so I
don't have a good understanding of how their different
derivations/intentions apply to this simpler context of comparing two
nested models to determine evidence for a specific effect of interest.

I would greatly appreciate any thoughts on this AIC/BIC issue, or any
other complexities that I've overlooked in my proscription to abandon
p-values in favor of the likelihood ratio (at least, for all
non-decision-making scientific applications of data analysis).

Cheers,

Mike

--
Mike Lawrence