[R-sig-ME] Model selection, LRT test

Sat Jun 18 20:44:55 CEST 2011

There are some functions that already simulate to get p-values, see the simulate.p.value argument to the fisher.test function.  Other functions could add this for some specific cases.  Some of the bootstrapping packages do parametric bootstrapping which is a form of simulation for these types of cases.

This link from a couple of years ago shows some examples of the simulation idea for testing mixed effects models:

http://finzi.psych.upenn.edu/R-sig-mixed-models/2009q1/001819.html

The general idea could be expanded (and probably improved) to other types of models as well.

A function or package could do this for very specific cases, but for the general problem you need something as general as R itself.  But putting together simulations like this in R is not that complicated, probably simpler than what you would need to specify to a do-it-all function.

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Arnaud Mosnier
Sent: Saturday, June 18, 2011 11:44 AM
To: luca borger
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Model selection, LRT test

Thanks for this reference, I will try to find this paper.
I would also be interested by any comments from the statistics masters
that we can find on this list.
If it's correct ... is there any package (function) allowing to apply
their suggested approach ("simulation-based approach").

Arnaud

2011/6/17 luca borger <luca.borger at cebc.cnrs.fr>:
>>Thanks to remind me that LRT are only for nested model
>
> actually, Lewis et al. (2011, Abstract below) just published a paper where they claim that LRTs can be used also for non nested models ("This fact is well-established in the statistical literature, but not widely used in ecological studies."). Has anyone read this paper? I'd be interested to hear any comments (incl. if you believe the authors' approach could be used also for GLMMs).
>
>
> Cheers,
>
> Luca
>
>
>
>
> http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2010.00063.x/abstract
>
> A unified approach to model selection using the likelihood ratio test
> Fraser Lewis, Adam Butler, Lucy Gilbert
>
> Methods in Ecology and Evolution
> Volume 2, Issue 2, pages 155–162, April 2011
> DOI: 10.1111/j.2041-210X.2010.00063.x
>
> Summary
> 1. Ecological count data typically exhibit complexities such as overdispersion and zero-inflation, and are often weakly associated with a relatively large number of correlated covariates. The use of an appropriate statistical model for inference is therefore essential. A common selection criteria for choosing between nested models is the likelihood ratio test (LRT). Widely used alternatives to the LRT are based on information-theoretic metrics such as the Akaike Information Criterion.
>
> 2. It is widely believed that the LRT can only be used to compare the performance of nested models – i.e. in situations where one model is a special case of another. There are many situations in which it is important to compare non-nested models, so, if true, this would be a substantial drawback of using LRTs for model comparison. In reality, however, it is actually possible to use the LRT for comparing both nested and non-nested models. This fact is well-established in the statistical literature, but not widely used in ecological studies.
>
> 3. The main obstacle to the use of the LRT with non-nested models has, until relatively recently, been the fact that it is difficult to explicitly write down a formula for the distribution of the LRT statistic under the null hypothesis that one of the models is true. With modern computing power it is possible to overcome this difficulty by using a simulation-based approach.
>
> 4. To demonstrate the practical application of the LRT to both nested and non-nested model comparisons, a case study involving data on questing tick (Ixodes ricinus) abundance is presented. These data contain complexities typical in ecological analyses, such as zero-inflation and overdispersion, for which comparison between models of differing structure – e.g. non-nested models – is of particular importance.
>
> 5. Choosing between competing statistical models is an essential part of any applied ecological analysis. The LRT is a standard statistical test for comparing nested models. By use of simulation the LRT can also be used in an analogous fashion to compare non-nested models, thereby providing a unified approach for model comparison within the null hypothesis testing paradigm. A simple practical guide is provided in how to apply this approach to the key models required in the analyses of count data.
>
>
>
>
> -----Original Message-----
> From: Arnaud Mosnier <a.mosnier at gmail.com>
> To: Andrew Miles <rstuff.miles at gmail.com>
> Date: Fri, 17 Jun 2011 12:44:50 -0400
> Subject: Re: [R-sig-ME] Model selection, LRT test
>
> Andrew,
>
> Thanks to remind me that LRT are only for nested model ... this is not
> the case in my situation.
>
> Arnaud
>
> 2011/6/17 Andrew Miles <rstuff.miles at gmail.com>:
>> What types of models are you running?
>>
>> There may be two issues at play.
>>
>> 1. Likelihood ratio tests are only for nested models (i.e. models where the
>> variables in one model are a subset of the variables in the other) so by
>> definition there will always be a difference in degrees of freedom.
>>
>> 2. With mixed models you can only use a likelihood ratio test when the model
>> returns a deviance score - so not for generalized linear mixed models in
>> most cases (though I believe that you can use LRT's if they are estimated
>> using a type of numerical integration, but not any sort of quasi-likelihood
>> like PQL)
>>
>> Andrew
>>
>> On Fri, Jun 17, 2011 at 11:12 AM, Arnaud Mosnier <a.mosnier at gmail.com>
>> wrote:
>>>
>>> Dear list,
>>>
>>> If I do not make a mistake, use of Likelihood ratio test is precluded
>>> when two models have the same number of degree of freedom.
>>> Is there a way to test which one is the best when both are close in
>>> AIC value (difference < 5) or do I have to conclude that they are
>>> "equivalent" ?
>>>
>>> Thanks
>>>
>>> Arnaud
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
> __________ Information from ESET Mail Security, version of virus signature database 6217 (20110617) __________
>
> The message was checked by ESET Mail Security.
> http://www.eset.com
>
>
>
>
>
>
>
> __________ Information from ESET Mail Security, version of virus signature database 6218 (20110617) __________
>
> The message was checked by ESET Mail Security.
> http://www.eset.com
>
>
>

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models