[R] Coefficients of Logistic Regression from bootstrap - how to get them?

Tue Jul 22 21:42:25 CEST 2008

Dear Marc and all,

Thank you for all the due respect.

I tried to explain as much explicitly as I could what I am trying to do 
in my first email. I did not invent this procedure, it was already 
published in the paper:

T. Pawinski, M. Hale, M. Korecka, W.E. Fitzsimmons, L.M. Shaw. Limited 
Sampling Strategy for the Estimation of Mycophenolic Acid Area under the 
Curve in Adult Renal Transplant Patients Treated with Concomitant 
Tacrolimus. Clinical Chemistry 2002(48:9), 1497-1504

I only adopted this methodology to work under SAS and now I try to do it 
under R, because I like R. I need a practical advice because I have a 
practical problem, and I do not understand much of the theoretical 
discussion on what bootstrap is suitable for or not. Apparently I am 
trying to use it for something else than the experts are used to...

Honestly, I did not learn anything from this discussion so far, I am 
just disappointed.

Though, since the discussion has already started, I'd welcome your 
criticism on this procedure - I just ask that you express it in human 
language.

--
Michal J. Figurski

Marc Schwartz wrote:
> Michal,
> 
> With all due respect, you have openly acknowledged that you don't know 
> enough about the subject at hand.
> 
> If that is the case, on what basis are you in a position to challenge 
> the collective wisdom of those professionals who have voluntarily 
> offered *expert* level statistical advice to you?
> 
> You have erected a wall around your thinking.
> 
> You may choose to use R or any other software application to 
> "Git-R-Done". But that does not make it correct.
> 
> There are other methods to consider that could be used during the model 
> building process itself, rather than on a post-hoc basis and I would 
> specifically refer you to Frank's book, Regression Modeling Strategies:
> 
>   http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS
> 
> Marc Schwartz
> 
> on 07/22/2008 09:43 AM Michal Figurski wrote:
>> Hmm...
>>
>> It sounds like ideology to me. I was asking for technical help. I know 
>> what I want to do, just don't know how to do it in R. I'll go back to 
>> SAS then. Thank you.
>>
>> -- 
>> Michal J. Figurski
>>
>> Doran, Harold wrote:
>>> I think the answer has been given to you. If you want to continue to
>>> ignore that advice and use bootstrap for point estimates rather than the
>>> properties of those estimates (which is what bootstrap is for) then you
>>> are on your own.
>>>> -----Original Message-----
>>>> From: r-help-bounces at r-project.org 
>>>> [mailto:r-help-bounces at r-project.org] On Behalf Of Michal Figurski
>>>> Sent: Tuesday, July 22, 2008 9:52 AM
>>>> To: r-help at r-project.org
>>>> Subject: Re: [R] Coefficients of Logistic Regression from bootstrap 
>>>> - how to get them?
>>>>
>>>> Dear all,
>>>>
>>>> I don't want to argue with anybody about words or about what 
>>>> bootstrap is suitable for - I know too little for that.
>>>>
>>>> All I need is help to get the *equation coefficients* optimized by 
>>>> bootstrap - either by one of the functions or by simple median.
>>>>
>>>> Please help,
>>>>
>>>> -- 
>>>> Michal J. Figurski
>>>> HUP, Pathology & Laboratory Medicine
>>>> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce St. 7 
>>>> Maloney Philadelphia, PA 19104 tel. (215) 662-3413
>>>>
>>>> Frank E Harrell Jr wrote:
>>>>> Michal Figurski wrote:
>>>>>> Frank,
>>>>>>
>>>>>> "How does bootstrap improve on that?"
>>>>>>
>>>>>> I don't know, but I have an idea. Since the data in my set 
>>>> are just a
>>>>>> small sample of a big population, then if I use my whole 
>>>> dataset to
>>>>>> obtain max likelihood estimates, these estimates may be 
>>>> best for this
>>>>>> dataset, but far from ideal for the whole population.
>>>>> The bootstrap, being a resampling procedure from your 
>>>> sample, has the
>>>>> same issues about the population as MLEs.
>>>>>
>>>>>> I used bootstrap to virtually increase the size of my dataset, it 
>>>>>> should result in estimates more close to that from the 
>>>> population -
>>>>>> isn't it the purpose of bootstrap?
>>>>> No
>>>>>
>>>>>> When I use such median coefficients on another dataset (another 
>>>>>> sample from population), the predictions are better, than 
>>>> using max
>>>>>> likelihood estimates. I have already tested that and it worked!
>>>>> Then your testing procedure is probably not valid.
>>>>>
>>>>>> I am not a statistician and I don't feel what 
>>>> "overfitting" is, but
>>>>>> it may be just another word for the same idea.
>>>>>>
>>>>>> Nevertheless, I would still like to know how can I get the 
>>>>>> coeffcients for the model that gives the "nearly unbiased 
>>>> estimates".
>>>>>> I greatly appreciate your help.
>>>>> More info in my book Regression Modeling Strategies.
>>>>>
>>>>> Frank
>>>>>
>>>>>> -- 
>>>>>> Michal J. Figurski
>>>>>> HUP, Pathology & Laboratory Medicine
>>>>>> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce St. 7 
>>>>>> Maloney Philadelphia, PA 19104 tel. (215) 662-3413
>>>>>>
>>>>>> Frank E Harrell Jr wrote:
>>>>>>> Michal Figurski wrote:
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>> I am trying to optimize my logistic regression model by using 
>>>>>>>> bootstrap. I was previously using SAS for this kind of 
>>>> tasks, but I
>>>>>>>> am now switching to R.
>>>>>>>>
>>>>>>>> My data frame consists of 5 columns and has 109 rows. 
>>>> Each row is a
>>>>>>>> single record composed of the following values: Subject_name, 
>>>>>>>> numeric1, numeric2, numeric3 and outcome (yes or no). All three 
>>>>>>>> numerics are used to predict outcome using LR.
>>>>>>>>
>>>>>>>> In SAS I have written a macro, that was splitting the dataset, 
>>>>>>>> running LR on one half of data and making predictions on second 
>>>>>>>> half. Then it was collecting the equation coefficients from each 
>>>>>>>> iteration of bootstrap. Later I was just taking medians of these 
>>>>>>>> coefficients from all iterations, and used them as an 
>>>> optimal model
>>>>>>>> - it really worked well!
>>>>>>> Why not use maximum likelihood estimation, i.e., the coefficients 
>>>>>>> from the original fit.  How does the bootstrap improve on that?
>>>>>>>
>>>>>>>> Now I want to do the same in R. I tried to use the 'validate' or 
>>>>>>>> 'calibrate' functions from package "Design", and I also 
>>>>>>>> experimented with function 'sm.binomial.bootstrap' from package 
>>>>>>>> "sm". I tried also the function 'boot' from package 
>>>> "boot", though
>>>>>>>> without success
>>>>>>>> - in my case it randomly selected _columns_ from my data frame, 
>>>>>>>> while I wanted it to select _rows_.
>>>>>>> validate and calibrate in Design do resampling on the rows
>>>>>>>
>>>>>>> Resampling is mainly used to get a nearly unbiased 
>>>> estimate of the
>>>>>>> model performance, i.e., to correct for overfitting.
>>>>>>>
>>>>>>> Frank Harrell
>>>>>>>
>>>>>>>> Though the main point here is the optimized LR equation. I would 
>>>>>>>> appreciate any help on how to extract the LR equation 
>>>> coefficients
>>>>>>>> from any of these bootstrap functions, in the same form 
>>>> as given by
>>>>>>>> 'glm' or 'lrm'.
>>>>>>>>
>>>>>>>> Many thanks in advance!
>>>>>>>>