[R] validate (rms package) using step instead of fastbw
Frank E Harrell Jr
f.harrell at Vanderbilt.Edu
Fri Feb 12 18:26:22 CET 2010
Ramon Diaz-Uriarte wrote:
> Frank, let me make sure I understand:
>
>
>
> On Fri, Feb 12, 2010 at 5:57 PM, Frank E Harrell Jr
> <f.harrell at vanderbilt.edu> wrote:
>> Ramon Diaz-Uriarte wrote:
>>> Dear Frank,
>>>
>>> Thanks a lot for your response. And apologies for the question,
>>> because the answer was obviously in the help.
>>>
>>> As for the caveats on selection: yes, thanks. I think I am actually
>>> closely following your book (eg., pp. 249 to 253), and one of the
>>> points I am trying to make to my colleagues is that by doing variable
>>> selection, we are actually getting a worse model (as evidenced by the
>>> bias-corrected AUC, which is smaller if attempting variable
>>> selection).
>>>
>>>
>>> Best,
>>>
>>> R.
>> Thanks Ramon.
>>
>> Bias-corrected measures need to be penalized for all variable selection
>> steps and for univariable screening. When the penalization is complete, you
>> usually see worse model performance as compared with full model fits, as you
>> wrote.
>>
>
> I thought that by using validate, and starting from the original
> (non-screened) model and using "bw = TRUE" in the call to validate,
> the bias-corrected statistics already include that penalization. After
> all, for each one of the bootstrap iterations, the selection process
> is carried out only with the in-bag bootstrap sample, but the "test"
> is conducted with the out-of-bag sample. So my understanding was that
> using the Dxy under the "corrected index" column I had accounted for
> the screening involved in the variable selection.
>
>
> Thanks,
>
> R.
Ramon,
Yes you have it right, assuming there was no univariable or other
screening done that bw=TRUE would not know about. [Note that test and
training samples overlap with the ordinary bootstrap procedure though.]
I wasn't familiar with "bias correct AIC" and assumed that came from
another function. validate() produces the proper corrected indexes for
the indexes it prints.
Frank
>
>
>
>
>> Cheers
>> Frank
>>
>>>
>>>
>>>
>>>
>>> On Fri, Feb 12, 2010 at 3:13 PM, Frank E Harrell Jr
>>> <f.harrell at vanderbilt.edu> wrote:
>>>> Ramon Diaz-Uriarte wrote:
>>>>> Dear All,
>>>>>
>>>>> For logistic regression models: is it possible to use validate (rms
>>>>> package) to compute bias-corrected AUC, but have variable selection
>>>>> with AIC use step (or stepAIC, from MASS), instead of fastbw?
>>>>>
>>>>>
>>>>> More details:
>>>>>
>>>>> I've been using the validate function (in the rms package, by Frank
>>>>> Harrell) to obtain, among other things, bootstrap bias-corrected
>>>>> estimates of the AUC, when variable selection is carried out (using
>>>>> AIC as criterion). validate calls predab.resample, which in turn calls
>>>>> fastbw (from the Design package, by Harrell). fastbw " Performs a
>>>>> slightly inefficient but numerically stable version of fast backward
>>>>> elimination on factors, using a method based on Lawless and Singhal
>>>>> (1978). This method uses the fitted complete model (...)". However, I
>>>>> am finding that the models returned by fastbw are much smaller than
>>>>> those returned by stepAIC or step (a simple example is shown below),
>>>>> probably because of the approximation and using the complete model.
>>>>>
>>>>> I'd like to use step instead of fastbw. I think this can be done by
>>>>> hacking predab.resample in a couple of places but I am wondering if
>>>>> this is a bad idea (why?) or if I am reinventing the wheel.
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> R.
>>>>>
>>>>>
>>>>> P.S. Simple example of fastbw compared to step:
>>>>>
>>>>> library(MASS) ## for stepAIC and bwt data
>>>>> example(birthwt)
>>>>> library(rms)
>>>>>
>>>>> bwt.glm <- glm(low ~ ., family = binomial, data = bwt)
>>>>> bwt.lrm <- lrm(low ~ ., data = bwt)
>>>>>
>>>>> step(bwt.glm)
>>>>> ## same as stepAIC(bwt.glm)
>>>>>
>>>>> fastbw(bwt.lrm)
>>>> Hi Ramon,
>>>>
>>>> By default fastbw uses type='residual' to compute test statistics on all
>>>> deleted variables combined. Use type='individual' to get the behavior in
>>>> step. In your example fastbw(..., type='ind') gives the same model as
>>>> step() and comes surprisingly close to estimating the MLEs without
>>>> refitting. Of course you refit the reduced model to get MLEs. Both true
>>>> and approximate MLEs are biased by the variable selection so beware.
>>>> type=
>>>> can be passed from calibrate or validate to fastbw.
>>>>
>>>> Note that none of the statistics computed by step or fastbw were designed
>>>> to
>>>> be used with more than two completely pre-specified models. Variable
>>>> selection is hazardous both to inference and to prediction. There is no
>>>> free
>>>> lunch; we are torturing data to confess its own sins.
>>>>
>>>> Frank
>>>>
More information about the R-help
mailing list