[R] Coefficients of Logistic Regression from bootstrap - how to get them?

Doran, Harold HDoran at air.org
Tue Jul 22 20:29:54 CEST 2008


> install.packages('fortunes')
> library(fortunes)
> fortune(28) 


> -----Original Message-----
> From: Marc Schwartz [mailto:marc_schwartz at comcast.net] 
> Sent: Tuesday, July 22, 2008 1:29 PM
> To: Michal Figurski
> Cc: Doran, Harold; r-help at r-project.org; Frank E Harrell Jr; 
> Bert Gunter
> Subject: Re: [R] Coefficients of Logistic Regression from 
> bootstrap - how to get them?
> 
> Michal,
> 
> With all due respect, you have openly acknowledged that you 
> don't know enough about the subject at hand.
> 
> If that is the case, on what basis are you in a position to 
> challenge the collective wisdom of those professionals who 
> have voluntarily offered *expert* level statistical advice to you?
> 
> You have erected a wall around your thinking.
> 
> You may choose to use R or any other software application to 
> "Git-R-Done". But that does not make it correct.
> 
> There are other methods to consider that could be used during 
> the model building process itself, rather than on a post-hoc 
> basis and I would specifically refer you to Frank's book, 
> Regression Modeling Strategies:
> 
>    http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS
> 
> Marc Schwartz
> 
> on 07/22/2008 09:43 AM Michal Figurski wrote:
> > Hmm...
> > 
> > It sounds like ideology to me. I was asking for technical 
> help. I know 
> > what I want to do, just don't know how to do it in R. I'll 
> go back to 
> > SAS then. Thank you.
> > 
> > --
> > Michal J. Figurski
> > 
> > Doran, Harold wrote:
> >> I think the answer has been given to you. If you want to 
> continue to 
> >> ignore that advice and use bootstrap for point estimates 
> rather than 
> >> the properties of those estimates (which is what bootstrap is for) 
> >> then you are on your own.
> >>> -----Original Message-----
> >>> From: r-help-bounces at r-project.org
> >>> [mailto:r-help-bounces at r-project.org] On Behalf Of Michal Figurski
> >>> Sent: Tuesday, July 22, 2008 9:52 AM
> >>> To: r-help at r-project.org
> >>> Subject: Re: [R] Coefficients of Logistic Regression from 
> bootstrap 
> >>> - how to get them?
> >>>
> >>> Dear all,
> >>>
> >>> I don't want to argue with anybody about words or about what 
> >>> bootstrap is suitable for - I know too little for that.
> >>>
> >>> All I need is help to get the *equation coefficients* 
> optimized by 
> >>> bootstrap - either by one of the functions or by simple median.
> >>>
> >>> Please help,
> >>>
> >>> --
> >>> Michal J. Figurski
> >>> HUP, Pathology & Laboratory Medicine Xenobiotics Toxicokinetics 
> >>> Research Laboratory 3400 Spruce St. 7 Maloney 
> Philadelphia, PA 19104 
> >>> tel. (215) 662-3413
> >>>
> >>> Frank E Harrell Jr wrote:
> >>>> Michal Figurski wrote:
> >>>>> Frank,
> >>>>>
> >>>>> "How does bootstrap improve on that?"
> >>>>>
> >>>>> I don't know, but I have an idea. Since the data in my set
> >>> are just a
> >>>>> small sample of a big population, then if I use my whole
> >>> dataset to
> >>>>> obtain max likelihood estimates, these estimates may be
> >>> best for this
> >>>>> dataset, but far from ideal for the whole population.
> >>>> The bootstrap, being a resampling procedure from your
> >>> sample, has the
> >>>> same issues about the population as MLEs.
> >>>>
> >>>>> I used bootstrap to virtually increase the size of my 
> dataset, it 
> >>>>> should result in estimates more close to that from the
> >>> population -
> >>>>> isn't it the purpose of bootstrap?
> >>>> No
> >>>>
> >>>>> When I use such median coefficients on another dataset (another 
> >>>>> sample from population), the predictions are better, than
> >>> using max
> >>>>> likelihood estimates. I have already tested that and it worked!
> >>>> Then your testing procedure is probably not valid.
> >>>>
> >>>>> I am not a statistician and I don't feel what
> >>> "overfitting" is, but
> >>>>> it may be just another word for the same idea.
> >>>>>
> >>>>> Nevertheless, I would still like to know how can I get the 
> >>>>> coeffcients for the model that gives the "nearly unbiased
> >>> estimates".
> >>>>> I greatly appreciate your help.
> >>>> More info in my book Regression Modeling Strategies.
> >>>>
> >>>> Frank
> >>>>
> >>>>> --
> >>>>> Michal J. Figurski
> >>>>> HUP, Pathology & Laboratory Medicine Xenobiotics Toxicokinetics 
> >>>>> Research Laboratory 3400 Spruce St. 7 Maloney Philadelphia, PA 
> >>>>> 19104 tel. (215) 662-3413
> >>>>>
> >>>>> Frank E Harrell Jr wrote:
> >>>>>> Michal Figurski wrote:
> >>>>>>> Hello all,
> >>>>>>>
> >>>>>>> I am trying to optimize my logistic regression model by using 
> >>>>>>> bootstrap. I was previously using SAS for this kind of
> >>> tasks, but I
> >>>>>>> am now switching to R.
> >>>>>>>
> >>>>>>> My data frame consists of 5 columns and has 109 rows. 
> >>> Each row is a
> >>>>>>> single record composed of the following values: Subject_name, 
> >>>>>>> numeric1, numeric2, numeric3 and outcome (yes or no). 
> All three 
> >>>>>>> numerics are used to predict outcome using LR.
> >>>>>>>
> >>>>>>> In SAS I have written a macro, that was splitting the 
> dataset, 
> >>>>>>> running LR on one half of data and making predictions 
> on second 
> >>>>>>> half. Then it was collecting the equation 
> coefficients from each 
> >>>>>>> iteration of bootstrap. Later I was just taking 
> medians of these 
> >>>>>>> coefficients from all iterations, and used them as an
> >>> optimal model
> >>>>>>> - it really worked well!
> >>>>>> Why not use maximum likelihood estimation, i.e., the 
> coefficients 
> >>>>>> from the original fit.  How does the bootstrap improve on that?
> >>>>>>
> >>>>>>> Now I want to do the same in R. I tried to use the 
> 'validate' or 
> >>>>>>> 'calibrate' functions from package "Design", and I also 
> >>>>>>> experimented with function 'sm.binomial.bootstrap' 
> from package 
> >>>>>>> "sm". I tried also the function 'boot' from package
> >>> "boot", though
> >>>>>>> without success
> >>>>>>> - in my case it randomly selected _columns_ from my 
> data frame, 
> >>>>>>> while I wanted it to select _rows_.
> >>>>>> validate and calibrate in Design do resampling on the rows
> >>>>>>
> >>>>>> Resampling is mainly used to get a nearly unbiased
> >>> estimate of the
> >>>>>> model performance, i.e., to correct for overfitting.
> >>>>>>
> >>>>>> Frank Harrell
> >>>>>>
> >>>>>>> Though the main point here is the optimized LR 
> equation. I would 
> >>>>>>> appreciate any help on how to extract the LR equation
> >>> coefficients
> >>>>>>> from any of these bootstrap functions, in the same form
> >>> as given by
> >>>>>>> 'glm' or 'lrm'.
> >>>>>>>
> >>>>>>> Many thanks in advance!
> >>>>>>>
> 



More information about the R-help mailing list