[R] validate (rms package) using step instead of fastbw

Frank E Harrell Jr f.harrell at Vanderbilt.Edu
Fri Feb 12 18:26:22 CET 2010


Ramon Diaz-Uriarte wrote:
> Frank, let me make sure I understand:
> 
> 
> 
> On Fri, Feb 12, 2010 at 5:57 PM, Frank E Harrell Jr
> <f.harrell at vanderbilt.edu> wrote:
>> Ramon Diaz-Uriarte wrote:
>>> Dear Frank,
>>>
>>> Thanks a lot for your response. And apologies for the question,
>>> because the answer was obviously in the help.
>>>
>>> As for the caveats on selection: yes, thanks. I think I am actually
>>> closely following your book (eg., pp. 249 to 253), and one of the
>>> points I am trying to make to my colleagues is that by doing variable
>>> selection, we are actually getting a worse model (as evidenced by the
>>> bias-corrected AUC, which is smaller if attempting variable
>>> selection).
>>>
>>>
>>> Best,
>>>
>>> R.
>> Thanks Ramon.
>>
>> Bias-corrected measures need to be penalized for all variable selection
>> steps and for univariable screening.  When the penalization is complete, you
>> usually see worse model performance as compared with full model fits, as you
>> wrote.
>>
> 
> I thought that by using validate, and starting from the original
> (non-screened) model and using "bw = TRUE" in the call to validate,
> the bias-corrected statistics already include that penalization. After
> all, for each one of the bootstrap iterations, the selection process
> is carried out only with the in-bag bootstrap sample, but the "test"
> is conducted with the out-of-bag sample. So my understanding was that
> using the Dxy under the "corrected index" column I had accounted for
> the screening involved in the variable selection.
> 
> 
> Thanks,
> 
> R.

Ramon,

Yes you have it right, assuming there was no univariable or other 
screening done that bw=TRUE would not know about.   [Note that test and 
training samples overlap with the ordinary bootstrap procedure though.] 
  I wasn't familiar with "bias correct AIC" and assumed that came from 
another function.  validate() produces the proper corrected indexes for 
the indexes it prints.

Frank

> 
> 
> 
> 
>> Cheers
>> Frank
>>
>>>
>>>
>>>
>>>
>>> On Fri, Feb 12, 2010 at 3:13 PM, Frank E Harrell Jr
>>> <f.harrell at vanderbilt.edu> wrote:
>>>> Ramon Diaz-Uriarte wrote:
>>>>> Dear All,
>>>>>
>>>>> For logistic regression models: is it possible to use validate (rms
>>>>> package) to compute bias-corrected AUC, but have variable selection
>>>>> with AIC use step (or stepAIC, from MASS), instead of fastbw?
>>>>>
>>>>>
>>>>> More details:
>>>>>
>>>>> I've been using the validate function (in the rms package, by Frank
>>>>> Harrell) to obtain, among other things, bootstrap bias-corrected
>>>>> estimates of the AUC, when variable selection is carried out (using
>>>>> AIC as criterion). validate calls predab.resample, which in turn calls
>>>>> fastbw (from the Design package, by Harrell). fastbw " Performs a
>>>>> slightly inefficient but numerically stable version of  fast backward
>>>>> elimination on factors, using a method based on Lawless and Singhal
>>>>> (1978). This method uses the fitted complete model (...)". However, I
>>>>> am finding that the models returned by fastbw are much smaller than
>>>>> those returned by stepAIC or step (a simple example is shown below),
>>>>> probably because of the approximation and using the complete model.
>>>>>
>>>>> I'd like to use step instead of fastbw. I think this can be done by
>>>>> hacking predab.resample in a couple of places but I am wondering if
>>>>> this is a bad idea (why?) or if I am reinventing the wheel.
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> R.
>>>>>
>>>>>
>>>>> P.S. Simple example of fastbw compared to step:
>>>>>
>>>>> library(MASS) ## for stepAIC and bwt data
>>>>> example(birthwt)
>>>>> library(rms)
>>>>>
>>>>> bwt.glm <- glm(low ~ ., family = binomial, data = bwt)
>>>>> bwt.lrm <- lrm(low ~ ., data = bwt)
>>>>>
>>>>> step(bwt.glm)
>>>>> ## same as stepAIC(bwt.glm)
>>>>>
>>>>> fastbw(bwt.lrm)
>>>> Hi Ramon,
>>>>
>>>> By default fastbw uses type='residual' to compute test statistics on all
>>>> deleted variables combined.  Use type='individual' to get the behavior in
>>>> step.  In your example fastbw(..., type='ind') gives the same model as
>>>> step() and comes surprisingly close to estimating the MLEs without
>>>> refitting.  Of course you refit the reduced model to get MLEs.  Both true
>>>> and approximate MLEs are biased by the variable selection so beware.
>>>>  type=
>>>> can be passed from calibrate or validate to fastbw.
>>>>
>>>> Note that none of the statistics computed by step or fastbw were designed
>>>> to
>>>> be used with more than two completely pre-specified models.  Variable
>>>> selection is hazardous both to inference and to prediction. There is no
>>>> free
>>>> lunch; we are torturing data to confess its own sins.
>>>>
>>>> Frank
>>>>



More information about the R-help mailing list