[R-sig-ME] p-values vs likelihood ratios

Tue Feb 22 04:20:41 CET 2011

On 11-02-21 09:45 AM, Mike Lawrence wrote:
> On Mon, Feb 21, 2011 at 9:24 AM, Ben Bolker <bbolker at gmail.com> wrote:
>>  I don't see why you're using AIC differences here.
> 
> My understanding it that taking the difference of the values resulting
> from AIC() is equivalent to computing the likelihood ratio then
> applying the AIC correction to account for the different number of
> parameters in each model (then log-transforming at the end).

 Yes, but a considerably different interpretation.

> My original exposure to likelihood ratios (and the AIC/BIC correction
> thereof) comes from Glover & Dixon (2004,
> http://www.psych.ualberta.ca/~pdixon/Home/Preprints/EasyLRms.pdf), who
> describe the raw likelihood ratio as inappropriately favoring the
> model with more parameters because more complex models have the
> ability to fit noise more precisely than less complex models. Hence
> application of some form of correction to account for the differential
> complexity of the models being compared.

  For another (entertaining) take on these subjects, I recommend Lindsey
1999, "Some Statistical Heresies" <http://www.jstor.org/stable/2680893>
 *Some* sort of calibration needs to be done to correct for model
complexity, but I'm not wild about AIC because it's originally framed in
terms of prediction, not testing ...
> 
> I wonder, however, whether cross validation might be a less
> controversial approach to achieving fair comparison of two models that
> differ in parameter number. That is, fit the models to a subset of the
> data, then compute the likelihoods on another subset of the data. I'll
> play around with this idea and report back any interesting findings...

  Cross-validation seems better (although I'm sure it has its own
problems, besides computational complexity -- I don't know it as well so
I don't know its flaws)

> 
>>   If one is really trying to test for "evidence of an effect" I see
>> nothing wrong with a p-value stated on the basis of the null
>> distribution of deviance differences between a full and a reduced model
>> - -- it's figuring out that distribution that is the hard part. If I were
>> doing this in a Bayesian framework I would look at the credible interval
>> of the parameters (although doing this for multi-parameter effects is
>> harder, which is why some MCMC-based "p values" have been concocted on
>> this list and elsewhere).
> 
> We'll possibly have to simply disagree on the general utility of
> p-values for cumulative science (as opposed to one-off decision
> making). I do, however, agree that Bayesian credible intervals have a
> role in cumulative science insofar as they permit a means of relative
> evaluation of models that differ not in the presence of an effect but
> in the specific magnitude of the effect, as may be encountered in more
> advanced/fleshed-out areas of inquiry. Otherwise, in the context of
> areas where the simple existence of an effect is of theoretical
> interest, computing credible intervals on effects seems like overkill
> and have (from my anti-p perspective) a dangerously easy connection to
> null-hypothesis significance testing.

  I think NHST is easy to abuse but not always wrong.

  Ben Bolker