[R] Coefficients of Logistic Regression from bootstrap - how to get them?

Tue Jul 22 15:51:57 CEST 2008

Dear all,

I don't want to argue with anybody about words or about what bootstrap 
is suitable for - I know too little for that.

All I need is help to get the *equation coefficients* optimized by 
bootstrap - either by one of the functions or by simple median.

Please help,

--
Michal J. Figurski
HUP, Pathology & Laboratory Medicine
Xenobiotics Toxicokinetics Research Laboratory
3400 Spruce St. 7 Maloney
Philadelphia, PA 19104
tel. (215) 662-3413

Frank E Harrell Jr wrote:
> Michal Figurski wrote:
>> Frank,
>>
>> "How does bootstrap improve on that?"
>>
>> I don't know, but I have an idea. Since the data in my set are just a 
>> small sample of a big population, then if I use my whole dataset to 
>> obtain max likelihood estimates, these estimates may be best for this 
>> dataset, but far from ideal for the whole population.
> 
> The bootstrap, being a resampling procedure from your sample, has the 
> same issues about the population as MLEs.
> 
>>
>> I used bootstrap to virtually increase the size of my dataset, it 
>> should result in estimates more close to that from the population - 
>> isn't it the purpose of bootstrap?
> 
> No
> 
>>
>> When I use such median coefficients on another dataset (another sample 
>> from population), the predictions are better, than using max 
>> likelihood estimates. I have already tested that and it worked!
> 
> Then your testing procedure is probably not valid.
> 
>>
>> I am not a statistician and I don't feel what "overfitting" is, but it 
>> may be just another word for the same idea.
>>
>> Nevertheless, I would still like to know how can I get the coeffcients 
>> for the model that gives the "nearly unbiased estimates". I greatly 
>> appreciate your help.
> 
> More info in my book Regression Modeling Strategies.
> 
> Frank
> 
>>
>> -- 
>> Michal J. Figurski
>> HUP, Pathology & Laboratory Medicine
>> Xenobiotics Toxicokinetics Research Laboratory
>> 3400 Spruce St. 7 Maloney
>> Philadelphia, PA 19104
>> tel. (215) 662-3413
>>
>> Frank E Harrell Jr wrote:
>>> Michal Figurski wrote:
>>>> Hello all,
>>>>
>>>> I am trying to optimize my logistic regression model by using 
>>>> bootstrap. I was previously using SAS for this kind of tasks, but I 
>>>> am now switching to R.
>>>>
>>>> My data frame consists of 5 columns and has 109 rows. Each row is a 
>>>> single record composed of the following values: Subject_name, 
>>>> numeric1, numeric2, numeric3 and outcome (yes or no). All three 
>>>> numerics are used to predict outcome using LR.
>>>>
>>>> In SAS I have written a macro, that was splitting the dataset, 
>>>> running LR on one half of data and making predictions on second 
>>>> half. Then it was collecting the equation coefficients from each 
>>>> iteration of bootstrap. Later I was just taking medians of these 
>>>> coefficients from all iterations, and used them as an optimal model 
>>>> - it really worked well!
>>>
>>> Why not use maximum likelihood estimation, i.e., the coefficients 
>>> from the original fit.  How does the bootstrap improve on that?
>>>
>>>>
>>>> Now I want to do the same in R. I tried to use the 'validate' or 
>>>> 'calibrate' functions from package "Design", and I also experimented 
>>>> with function 'sm.binomial.bootstrap' from package "sm". I tried 
>>>> also the function 'boot' from package "boot", though without success 
>>>> - in my case it randomly selected _columns_ from my data frame, 
>>>> while I wanted it to select _rows_.
>>>
>>> validate and calibrate in Design do resampling on the rows
>>>
>>> Resampling is mainly used to get a nearly unbiased estimate of the 
>>> model performance, i.e., to correct for overfitting.
>>>
>>> Frank Harrell
>>>
>>>>
>>>> Though the main point here is the optimized LR equation. I would 
>>>> appreciate any help on how to extract the LR equation coefficients 
>>>> from any of these bootstrap functions, in the same form as given by 
>>>> 'glm' or 'lrm'.
>>>>
>>>> Many thanks in advance!
>>>>
>>>
>>>
>>
> 
>