[R] Coefficients of Logistic Regression from bootstrap - how to get them?

Michal J. Figurski Michal at mail.med.upenn.edu
Wed Jul 23 04:54:36 CEST 2008


Dear all,

Since you guys are frank, let me be frank as well. I did not ask anyone to
impose on me their point of view on bootstrap. It's my impression that this is
what you guys are trying to do - that's sad. Some of your emails in this
discussion are worth less than junk mail - particularly the ones from Mr Harold
Doran. It's even more sad that you use junior members of this forum to make fun
and intimidate.

Apparently, even with all your expertise and education in this area, many of you
- experts - do not understand what I am talking about. You seem to be so much
affixed to your expertise, that you can't see anything beyond it.

Dear experts: if you do not wish to answer a question, why do you take time to
send useless emails?

Honestly, if you wished to help and educate me, these words as in email below
would never see light. Now that is expert advice. I am impressed!


-- 
Michal J. Figurski

Quoting Frank E Harrell Jr <f.harrell at vanderbilt.edu>:

> You don't understand any of the theory and you are using techniques you 
> don't understand and have provided no motivation for.  And you are the 
> one who is frustrated with others.  Wow.
> 
> Frank
> 
> > 
> > Doran, Harold wrote:
> >> I think the answer has been given to you. If you want to continue to
> >> ignore that advice and use bootstrap for point estimates rather than the
> >> properties of those estimates (which is what bootstrap is for) then you
> >> are on your own.
> >>> -----Original Message-----
> >>> From: r-help-bounces at r-project.org 
> >>> [mailto:r-help-bounces at r-project.org] On Behalf Of Michal Figurski
> >>> Sent: Tuesday, July 22, 2008 9:52 AM
> >>> To: r-help at r-project.org
> >>> Subject: Re: [R] Coefficients of Logistic Regression from bootstrap - 
> >>> how to get them?
> >>>
> >>> Dear all,
> >>>
> >>> I don't want to argue with anybody about words or about what 
> >>> bootstrap is suitable for - I know too little for that.
> >>>
> >>> All I need is help to get the *equation coefficients* optimized by 
> >>> bootstrap - either by one of the functions or by simple median.
> >>>
> >>> Please help,
> >>>
> >>> -- 
> >>> Michal J. Figurski
> >>> HUP, Pathology & Laboratory Medicine
> >>> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce St. 7 
> >>> Maloney Philadelphia, PA 19104 tel. (215) 662-3413
> >>>
> >>> Frank E Harrell Jr wrote:
> >>>> Michal Figurski wrote:
> >>>>> Frank,
> >>>>>
> >>>>> "How does bootstrap improve on that?"
> >>>>>
> >>>>> I don't know, but I have an idea. Since the data in my set 
> >>> are just a
> >>>>> small sample of a big population, then if I use my whole 
> >>> dataset to
> >>>>> obtain max likelihood estimates, these estimates may be 
> >>> best for this
> >>>>> dataset, but far from ideal for the whole population.
> >>>> The bootstrap, being a resampling procedure from your 
> >>> sample, has the
> >>>> same issues about the population as MLEs.
> >>>>
> >>>>> I used bootstrap to virtually increase the size of my dataset, it 
> >>>>> should result in estimates more close to that from the 
> >>> population -
> >>>>> isn't it the purpose of bootstrap?
> >>>> No
> >>>>
> >>>>> When I use such median coefficients on another dataset (another 
> >>>>> sample from population), the predictions are better, than 
> >>> using max
> >>>>> likelihood estimates. I have already tested that and it worked!
> >>>> Then your testing procedure is probably not valid.
> >>>>
> >>>>> I am not a statistician and I don't feel what 
> >>> "overfitting" is, but
> >>>>> it may be just another word for the same idea.
> >>>>>
> >>>>> Nevertheless, I would still like to know how can I get the 
> >>>>> coeffcients for the model that gives the "nearly unbiased 
> >>> estimates".
> >>>>> I greatly appreciate your help.
> >>>> More info in my book Regression Modeling Strategies.
> >>>>
> >>>> Frank
> >>>>
> >>>>> -- 
> >>>>> Michal J. Figurski
> >>>>> HUP, Pathology & Laboratory Medicine
> >>>>> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce St. 7 
> >>>>> Maloney Philadelphia, PA 19104 tel. (215) 662-3413
> >>>>>
> >>>>> Frank E Harrell Jr wrote:
> >>>>>> Michal Figurski wrote:
> >>>>>>> Hello all,
> >>>>>>>
> >>>>>>> I am trying to optimize my logistic regression model by using 
> >>>>>>> bootstrap. I was previously using SAS for this kind of 
> >>> tasks, but I
> >>>>>>> am now switching to R.
> >>>>>>>
> >>>>>>> My data frame consists of 5 columns and has 109 rows. 
> >>> Each row is a
> >>>>>>> single record composed of the following values: Subject_name, 
> >>>>>>> numeric1, numeric2, numeric3 and outcome (yes or no). All three 
> >>>>>>> numerics are used to predict outcome using LR.
> >>>>>>>
> >>>>>>> In SAS I have written a macro, that was splitting the dataset, 
> >>>>>>> running LR on one half of data and making predictions on second 
> >>>>>>> half. Then it was collecting the equation coefficients from each 
> >>>>>>> iteration of bootstrap. Later I was just taking medians of these 
> >>>>>>> coefficients from all iterations, and used them as an 
> >>> optimal model
> >>>>>>> - it really worked well!
> >>>>>> Why not use maximum likelihood estimation, i.e., the coefficients 
> >>>>>> from the original fit.  How does the bootstrap improve on that?
> >>>>>>
> >>>>>>> Now I want to do the same in R. I tried to use the 'validate' or 
> >>>>>>> 'calibrate' functions from package "Design", and I also 
> >>>>>>> experimented with function 'sm.binomial.bootstrap' from package 
> >>>>>>> "sm". I tried also the function 'boot' from package 
> >>> "boot", though
> >>>>>>> without success
> >>>>>>> - in my case it randomly selected _columns_ from my data frame, 
> >>>>>>> while I wanted it to select _rows_.
> >>>>>> validate and calibrate in Design do resampling on the rows
> >>>>>>
> >>>>>> Resampling is mainly used to get a nearly unbiased 
> >>> estimate of the
> >>>>>> model performance, i.e., to correct for overfitting.
> >>>>>>
> >>>>>> Frank Harrell
> >>>>>>
> >>>>>>> Though the main point here is the optimized LR equation. I would 
> >>>>>>> appreciate any help on how to extract the LR equation 
> >>> coefficients
> >>>>>>> from any of these bootstrap functions, in the same form 
> >>> as given by
> >>>>>>> 'glm' or 'lrm'.
> >>>>>>>
> >>>>>>> Many thanks in advance!
> >>>>>>>
> >>>>>>
> >>>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide 
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > 
> 
> 
> -- 
> Frank E Harrell Jr   Professor and Chair           School of Medicine
>                       Department of Biostatistics   Vanderbilt University
>



More information about the R-help mailing list