[R] validate (rms package) using step instead of fastbw

Frank E Harrell Jr f.harrell at Vanderbilt.Edu
Fri Feb 12 17:57:02 CET 2010


Ramon Diaz-Uriarte wrote:
> Dear Frank,
> 
> Thanks a lot for your response. And apologies for the question,
> because the answer was obviously in the help.
> 
> As for the caveats on selection: yes, thanks. I think I am actually
> closely following your book (eg., pp. 249 to 253), and one of the
> points I am trying to make to my colleagues is that by doing variable
> selection, we are actually getting a worse model (as evidenced by the
> bias-corrected AUC, which is smaller if attempting variable
> selection).
> 
> 
> Best,
> 
> R.

Thanks Ramon.

Bias-corrected measures need to be penalized for all variable selection 
steps and for univariable screening.  When the penalization is complete, 
you usually see worse model performance as compared with full model 
fits, as you wrote.

Cheers
Frank

> 
> 
> 
> 
> 
> On Fri, Feb 12, 2010 at 3:13 PM, Frank E Harrell Jr
> <f.harrell at vanderbilt.edu> wrote:
>> Ramon Diaz-Uriarte wrote:
>>> Dear All,
>>>
>>> For logistic regression models: is it possible to use validate (rms
>>> package) to compute bias-corrected AUC, but have variable selection
>>> with AIC use step (or stepAIC, from MASS), instead of fastbw?
>>>
>>>
>>> More details:
>>>
>>> I've been using the validate function (in the rms package, by Frank
>>> Harrell) to obtain, among other things, bootstrap bias-corrected
>>> estimates of the AUC, when variable selection is carried out (using
>>> AIC as criterion). validate calls predab.resample, which in turn calls
>>> fastbw (from the Design package, by Harrell). fastbw " Performs a
>>> slightly inefficient but numerically stable version of  fast backward
>>> elimination on factors, using a method based on Lawless and Singhal
>>> (1978). This method uses the fitted complete model (...)". However, I
>>> am finding that the models returned by fastbw are much smaller than
>>> those returned by stepAIC or step (a simple example is shown below),
>>> probably because of the approximation and using the complete model.
>>>
>>> I'd like to use step instead of fastbw. I think this can be done by
>>> hacking predab.resample in a couple of places but I am wondering if
>>> this is a bad idea (why?) or if I am reinventing the wheel.
>>>
>>>
>>> Best,
>>>
>>> R.
>>>
>>>
>>> P.S. Simple example of fastbw compared to step:
>>>
>>> library(MASS) ## for stepAIC and bwt data
>>> example(birthwt)
>>> library(rms)
>>>
>>> bwt.glm <- glm(low ~ ., family = binomial, data = bwt)
>>> bwt.lrm <- lrm(low ~ ., data = bwt)
>>>
>>> step(bwt.glm)
>>> ## same as stepAIC(bwt.glm)
>>>
>>> fastbw(bwt.lrm)
>> Hi Ramon,
>>
>> By default fastbw uses type='residual' to compute test statistics on all
>> deleted variables combined.  Use type='individual' to get the behavior in
>> step.  In your example fastbw(..., type='ind') gives the same model as
>> step() and comes surprisingly close to estimating the MLEs without
>> refitting.  Of course you refit the reduced model to get MLEs.  Both true
>> and approximate MLEs are biased by the variable selection so beware.  type=
>> can be passed from calibrate or validate to fastbw.
>>
>> Note that none of the statistics computed by step or fastbw were designed to
>> be used with more than two completely pre-specified models.  Variable
>> selection is hazardous both to inference and to prediction. There is no free
>> lunch; we are torturing data to confess its own sins.
>>
>> Frank
>>
>> --
>> Frank E Harrell Jr   Professor and Chairman        School of Medicine
>>                     Department of Biostatistics   Vanderbilt University
>>
> 
>



More information about the R-help mailing list