[R] Coefficients of Logistic Regression from bootstrap - how to get them?

Tue Jul 22 16:43:59 CEST 2008

Hmm...

It sounds like ideology to me. I was asking for technical help. I know 
what I want to do, just don't know how to do it in R. I'll go back to 
SAS then. Thank you.

--
Michal J. Figurski

Doran, Harold wrote:
> I think the answer has been given to you. If you want to continue to
> ignore that advice and use bootstrap for point estimates rather than the
> properties of those estimates (which is what bootstrap is for) then you
> are on your own. 
> 
>> -----Original Message-----
>> From: r-help-bounces at r-project.org 
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Michal Figurski
>> Sent: Tuesday, July 22, 2008 9:52 AM
>> To: r-help at r-project.org
>> Subject: Re: [R] Coefficients of Logistic Regression from 
>> bootstrap - how to get them?
>>
>> Dear all,
>>
>> I don't want to argue with anybody about words or about what 
>> bootstrap is suitable for - I know too little for that.
>>
>> All I need is help to get the *equation coefficients* 
>> optimized by bootstrap - either by one of the functions or by 
>> simple median.
>>
>> Please help,
>>
>> --
>> Michal J. Figurski
>> HUP, Pathology & Laboratory Medicine
>> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce 
>> St. 7 Maloney Philadelphia, PA 19104 tel. (215) 662-3413
>>
>> Frank E Harrell Jr wrote:
>>> Michal Figurski wrote:
>>>> Frank,
>>>>
>>>> "How does bootstrap improve on that?"
>>>>
>>>> I don't know, but I have an idea. Since the data in my set 
>> are just a 
>>>> small sample of a big population, then if I use my whole 
>> dataset to 
>>>> obtain max likelihood estimates, these estimates may be 
>> best for this 
>>>> dataset, but far from ideal for the whole population.
>>> The bootstrap, being a resampling procedure from your 
>> sample, has the 
>>> same issues about the population as MLEs.
>>>
>>>> I used bootstrap to virtually increase the size of my dataset, it 
>>>> should result in estimates more close to that from the 
>> population - 
>>>> isn't it the purpose of bootstrap?
>>> No
>>>
>>>> When I use such median coefficients on another dataset (another 
>>>> sample from population), the predictions are better, than 
>> using max 
>>>> likelihood estimates. I have already tested that and it worked!
>>> Then your testing procedure is probably not valid.
>>>
>>>> I am not a statistician and I don't feel what 
>> "overfitting" is, but 
>>>> it may be just another word for the same idea.
>>>>
>>>> Nevertheless, I would still like to know how can I get the 
>>>> coeffcients for the model that gives the "nearly unbiased 
>> estimates". 
>>>> I greatly appreciate your help.
>>> More info in my book Regression Modeling Strategies.
>>>
>>> Frank
>>>
>>>> --
>>>> Michal J. Figurski
>>>> HUP, Pathology & Laboratory Medicine
>>>> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce St. 7 
>>>> Maloney Philadelphia, PA 19104 tel. (215) 662-3413
>>>>
>>>> Frank E Harrell Jr wrote:
>>>>> Michal Figurski wrote:
>>>>>> Hello all,
>>>>>>
>>>>>> I am trying to optimize my logistic regression model by using 
>>>>>> bootstrap. I was previously using SAS for this kind of 
>> tasks, but I 
>>>>>> am now switching to R.
>>>>>>
>>>>>> My data frame consists of 5 columns and has 109 rows. 
>> Each row is a 
>>>>>> single record composed of the following values: Subject_name, 
>>>>>> numeric1, numeric2, numeric3 and outcome (yes or no). All three 
>>>>>> numerics are used to predict outcome using LR.
>>>>>>
>>>>>> In SAS I have written a macro, that was splitting the dataset, 
>>>>>> running LR on one half of data and making predictions on second 
>>>>>> half. Then it was collecting the equation coefficients from each 
>>>>>> iteration of bootstrap. Later I was just taking medians of these 
>>>>>> coefficients from all iterations, and used them as an 
>> optimal model
>>>>>> - it really worked well!
>>>>> Why not use maximum likelihood estimation, i.e., the coefficients 
>>>>> from the original fit.  How does the bootstrap improve on that?
>>>>>
>>>>>> Now I want to do the same in R. I tried to use the 'validate' or 
>>>>>> 'calibrate' functions from package "Design", and I also 
>>>>>> experimented with function 'sm.binomial.bootstrap' from package 
>>>>>> "sm". I tried also the function 'boot' from package 
>> "boot", though 
>>>>>> without success
>>>>>> - in my case it randomly selected _columns_ from my data frame, 
>>>>>> while I wanted it to select _rows_.
>>>>> validate and calibrate in Design do resampling on the rows
>>>>>
>>>>> Resampling is mainly used to get a nearly unbiased 
>> estimate of the 
>>>>> model performance, i.e., to correct for overfitting.
>>>>>
>>>>> Frank Harrell
>>>>>
>>>>>> Though the main point here is the optimized LR equation. I would 
>>>>>> appreciate any help on how to extract the LR equation 
>> coefficients 
>>>>>> from any of these bootstrap functions, in the same form 
>> as given by 
>>>>>> 'glm' or 'lrm'.
>>>>>>
>>>>>> Many thanks in advance!
>>>>>>
>>>>>
>>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>