[R] Coefficients of Logistic Regression from bootstrap - how to get them?
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Tue Jul 22 23:32:56 CEST 2008
Michal Figurski wrote:
> Dear Marc and all,
>
> Thank you for all the due respect.
>
> I tried to explain as much explicitly as I could what I am trying to do
> in my first email. I did not invent this procedure, it was already
> published in the paper:
>
> T. Pawinski, M. Hale, M. Korecka, W.E. Fitzsimmons, L.M. Shaw. Limited
> Sampling Strategy for the Estimation of Mycophenolic Acid Area under the
> Curve in Adult Renal Transplant Patients Treated with Concomitant
> Tacrolimus. Clinical Chemistry 2002(48:9), 1497-1504
If you send me a pdf of this paper I will be glad to take a look.
Rather than an ad hoc bootstrap procedure you might look at the
resistent/robust fit literature and use an objective function that
spells out what is being optimized.
There probably are cases where taking the median of a set of bootstrap
regression coefficient estimates works well in a certain sense, but I
would put my money on penalized maximum likelihood estimation.
As Marc said, your attitude towards free advice is puzzling.
Frank
>
> I only adopted this methodology to work under SAS and now I try to do it
> under R, because I like R. I need a practical advice because I have a
> practical problem, and I do not understand much of the theoretical
> discussion on what bootstrap is suitable for or not. Apparently I am
> trying to use it for something else than the experts are used to...
>
> Honestly, I did not learn anything from this discussion so far, I am
> just disappointed.
>
> Though, since the discussion has already started, I'd welcome your
> criticism on this procedure - I just ask that you express it in human
> language.
>
> --
> Michal J. Figurski
>
> Marc Schwartz wrote:
>> Michal,
>>
>> With all due respect, you have openly acknowledged that you don't know
>> enough about the subject at hand.
>>
>> If that is the case, on what basis are you in a position to challenge
>> the collective wisdom of those professionals who have voluntarily
>> offered *expert* level statistical advice to you?
>>
>> You have erected a wall around your thinking.
>>
>> You may choose to use R or any other software application to
>> "Git-R-Done". But that does not make it correct.
>>
>> There are other methods to consider that could be used during the
>> model building process itself, rather than on a post-hoc basis and I
>> would specifically refer you to Frank's book, Regression Modeling
>> Strategies:
>>
>> http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS
>>
>> Marc Schwartz
>>
>> on 07/22/2008 09:43 AM Michal Figurski wrote:
>>> Hmm...
>>>
>>> It sounds like ideology to me. I was asking for technical help. I
>>> know what I want to do, just don't know how to do it in R. I'll go
>>> back to SAS then. Thank you.
>>>
>>> --
>>> Michal J. Figurski
>>>
>>> Doran, Harold wrote:
>>>> I think the answer has been given to you. If you want to continue to
>>>> ignore that advice and use bootstrap for point estimates rather than
>>>> the
>>>> properties of those estimates (which is what bootstrap is for) then you
>>>> are on your own.
>>>>> -----Original Message-----
>>>>> From: r-help-bounces at r-project.org
>>>>> [mailto:r-help-bounces at r-project.org] On Behalf Of Michal Figurski
>>>>> Sent: Tuesday, July 22, 2008 9:52 AM
>>>>> To: r-help at r-project.org
>>>>> Subject: Re: [R] Coefficients of Logistic Regression from bootstrap
>>>>> - how to get them?
>>>>>
>>>>> Dear all,
>>>>>
>>>>> I don't want to argue with anybody about words or about what
>>>>> bootstrap is suitable for - I know too little for that.
>>>>>
>>>>> All I need is help to get the *equation coefficients* optimized by
>>>>> bootstrap - either by one of the functions or by simple median.
>>>>>
>>>>> Please help,
>>>>>
>>>>> --
>>>>> Michal J. Figurski
>>>>> HUP, Pathology & Laboratory Medicine
>>>>> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce St. 7
>>>>> Maloney Philadelphia, PA 19104 tel. (215) 662-3413
>>>>>
>>>>> Frank E Harrell Jr wrote:
>>>>>> Michal Figurski wrote:
>>>>>>> Frank,
>>>>>>>
>>>>>>> "How does bootstrap improve on that?"
>>>>>>>
>>>>>>> I don't know, but I have an idea. Since the data in my set
>>>>> are just a
>>>>>>> small sample of a big population, then if I use my whole
>>>>> dataset to
>>>>>>> obtain max likelihood estimates, these estimates may be
>>>>> best for this
>>>>>>> dataset, but far from ideal for the whole population.
>>>>>> The bootstrap, being a resampling procedure from your
>>>>> sample, has the
>>>>>> same issues about the population as MLEs.
>>>>>>
>>>>>>> I used bootstrap to virtually increase the size of my dataset, it
>>>>>>> should result in estimates more close to that from the
>>>>> population -
>>>>>>> isn't it the purpose of bootstrap?
>>>>>> No
>>>>>>
>>>>>>> When I use such median coefficients on another dataset (another
>>>>>>> sample from population), the predictions are better, than
>>>>> using max
>>>>>>> likelihood estimates. I have already tested that and it worked!
>>>>>> Then your testing procedure is probably not valid.
>>>>>>
>>>>>>> I am not a statistician and I don't feel what
>>>>> "overfitting" is, but
>>>>>>> it may be just another word for the same idea.
>>>>>>>
>>>>>>> Nevertheless, I would still like to know how can I get the
>>>>>>> coeffcients for the model that gives the "nearly unbiased
>>>>> estimates".
>>>>>>> I greatly appreciate your help.
>>>>>> More info in my book Regression Modeling Strategies.
>>>>>>
>>>>>> Frank
>>>>>>
>>>>>>> --
>>>>>>> Michal J. Figurski
>>>>>>> HUP, Pathology & Laboratory Medicine
>>>>>>> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce St. 7
>>>>>>> Maloney Philadelphia, PA 19104 tel. (215) 662-3413
>>>>>>>
>>>>>>> Frank E Harrell Jr wrote:
>>>>>>>> Michal Figurski wrote:
>>>>>>>>> Hello all,
>>>>>>>>>
>>>>>>>>> I am trying to optimize my logistic regression model by using
>>>>>>>>> bootstrap. I was previously using SAS for this kind of
>>>>> tasks, but I
>>>>>>>>> am now switching to R.
>>>>>>>>>
>>>>>>>>> My data frame consists of 5 columns and has 109 rows.
>>>>> Each row is a
>>>>>>>>> single record composed of the following values: Subject_name,
>>>>>>>>> numeric1, numeric2, numeric3 and outcome (yes or no). All three
>>>>>>>>> numerics are used to predict outcome using LR.
>>>>>>>>>
>>>>>>>>> In SAS I have written a macro, that was splitting the dataset,
>>>>>>>>> running LR on one half of data and making predictions on second
>>>>>>>>> half. Then it was collecting the equation coefficients from
>>>>>>>>> each iteration of bootstrap. Later I was just taking medians of
>>>>>>>>> these coefficients from all iterations, and used them as an
>>>>> optimal model
>>>>>>>>> - it really worked well!
>>>>>>>> Why not use maximum likelihood estimation, i.e., the
>>>>>>>> coefficients from the original fit. How does the bootstrap
>>>>>>>> improve on that?
>>>>>>>>
>>>>>>>>> Now I want to do the same in R. I tried to use the 'validate'
>>>>>>>>> or 'calibrate' functions from package "Design", and I also
>>>>>>>>> experimented with function 'sm.binomial.bootstrap' from package
>>>>>>>>> "sm". I tried also the function 'boot' from package
>>>>> "boot", though
>>>>>>>>> without success
>>>>>>>>> - in my case it randomly selected _columns_ from my data frame,
>>>>>>>>> while I wanted it to select _rows_.
>>>>>>>> validate and calibrate in Design do resampling on the rows
>>>>>>>>
>>>>>>>> Resampling is mainly used to get a nearly unbiased
>>>>> estimate of the
>>>>>>>> model performance, i.e., to correct for overfitting.
>>>>>>>>
>>>>>>>> Frank Harrell
>>>>>>>>
>>>>>>>>> Though the main point here is the optimized LR equation. I
>>>>>>>>> would appreciate any help on how to extract the LR equation
>>>>> coefficients
>>>>>>>>> from any of these bootstrap functions, in the same form
>>>>> as given by
>>>>>>>>> 'glm' or 'lrm'.
>>>>>>>>>
>>>>>>>>> Many thanks in advance!
>>>>>>>>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list