[R] Stepwise GLM selection by LRT?

Thu Jul 12 21:33:00 CEST 2007

On Thu, 12 Jul 2007, Lutz Ph. Breitling wrote:

> Thank you very much for the prompt reply. Seems like I had not fully
> understood what the k-parameter to stepAIC is doing.
> Your suggested approach looks indeed fine to me, actually I do not
> quite understand why you say that it's only an approximation to the
> LRT?

So this is computing AIC_k = -2L + kp.  If you compare models with p and 
p+q parameters, this is equvalent to comparing 2 log LR with kq and so for 
q=1 the Wilks' LRT is found for k = qchisq(1-p, df=1) (which is just a 
squared Normal).

However, no one said q would always be one, and stepAIC steps in terms, 
not individual coefficients.  Therein lies one of the approximations 
(another is in the asympototic distribution theory of the test).

> Best wishes-
> Lutz
>
> On 7/11/07, Ravi Varadhan <rvaradhan at jhmi.edu> wrote:
>> Check out the stepAIC function in MASS package.  This is a nice tool, where
>> you can actually implement any penalty even though the function's name has
>> "AIC" in it because it is the default.  Although this doesn't do an LRT test
>> based variable selection, you can sort of approximate it by using a penalty
>> of k = qchisq(1-p, df=1), where p is the p-value for variable selection.
>> This penalty means that a variable enters/exits an existing model, when the
>> absolute value of change in log-likelihood is greater than qchisq(1-p,
>> df=1). For p = 0.1, k = 2.71, and for p=0.05, k = 3.84.  Is this whhant
>> you'd like to do?
>>
>> Ravi.
>>
>> ----------------------------------------------------------------------------
>> -------
>>
>> Ravi Varadhan, Ph.D.
>>
>> Assistant Professor, The Center on Aging and Health
>>
>> Division of Geriatric Medicine and Gerontology
>>
>> Johns Hopkins University
>>
>> Ph: (410) 502-2619
>>
>> Fax: (410) 614-9625
>>
>> Email: rvaradhan at jhmi.edu
>>
>> Webpage:  http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html
>>
>>
>>
>> ----------------------------------------------------------------------------
>> --------
>>
>>
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Lutz Ph. Breitling
>> Sent: Wednesday, July 11, 2007 3:06 PM
>> To: r-help at stat.math.ethz.ch
>> Subject: [R] Stepwise GLM selection by LRT?
>>
>> Dear List,
>>
>> having searched the help and archives, I have the impression that
>> there is no automatic model selection procedure implemented in R that
>> includes/excludes predictors in logistic regression models based on
>> LRT P-values. Is that true, or is someone aware of an appropriate
>> function somewhere in a custom package?
>>
>> Even if automatic model selection and LRT might not be the most
>> appropriate methods, I actually would like to use these in order to
>> simulate someone else's modeling approach...
>>
>> Many thanks for all comments-
>> Lutz
>> -----
>> Lutz Ph. Breitling
>> German Cancer Research Center
>> Heidelberg/Germany
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595