[R] Coefficients of Logistic Regression from bootstrap - how to get them?
Michal Figurski
figurski at mail.med.upenn.edu
Tue Jul 22 21:42:25 CEST 2008
Dear Marc and all,
Thank you for all the due respect.
I tried to explain as much explicitly as I could what I am trying to do
in my first email. I did not invent this procedure, it was already
published in the paper:
T. Pawinski, M. Hale, M. Korecka, W.E. Fitzsimmons, L.M. Shaw. Limited
Sampling Strategy for the Estimation of Mycophenolic Acid Area under the
Curve in Adult Renal Transplant Patients Treated with Concomitant
Tacrolimus. Clinical Chemistry 2002(48:9), 1497-1504
I only adopted this methodology to work under SAS and now I try to do it
under R, because I like R. I need a practical advice because I have a
practical problem, and I do not understand much of the theoretical
discussion on what bootstrap is suitable for or not. Apparently I am
trying to use it for something else than the experts are used to...
Honestly, I did not learn anything from this discussion so far, I am
just disappointed.
Though, since the discussion has already started, I'd welcome your
criticism on this procedure - I just ask that you express it in human
language.
--
Michal J. Figurski
Marc Schwartz wrote:
> Michal,
>
> With all due respect, you have openly acknowledged that you don't know
> enough about the subject at hand.
>
> If that is the case, on what basis are you in a position to challenge
> the collective wisdom of those professionals who have voluntarily
> offered *expert* level statistical advice to you?
>
> You have erected a wall around your thinking.
>
> You may choose to use R or any other software application to
> "Git-R-Done". But that does not make it correct.
>
> There are other methods to consider that could be used during the model
> building process itself, rather than on a post-hoc basis and I would
> specifically refer you to Frank's book, Regression Modeling Strategies:
>
> http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS
>
> Marc Schwartz
>
> on 07/22/2008 09:43 AM Michal Figurski wrote:
>> Hmm...
>>
>> It sounds like ideology to me. I was asking for technical help. I know
>> what I want to do, just don't know how to do it in R. I'll go back to
>> SAS then. Thank you.
>>
>> --
>> Michal J. Figurski
>>
>> Doran, Harold wrote:
>>> I think the answer has been given to you. If you want to continue to
>>> ignore that advice and use bootstrap for point estimates rather than the
>>> properties of those estimates (which is what bootstrap is for) then you
>>> are on your own.
>>>> -----Original Message-----
>>>> From: r-help-bounces at r-project.org
>>>> [mailto:r-help-bounces at r-project.org] On Behalf Of Michal Figurski
>>>> Sent: Tuesday, July 22, 2008 9:52 AM
>>>> To: r-help at r-project.org
>>>> Subject: Re: [R] Coefficients of Logistic Regression from bootstrap
>>>> - how to get them?
>>>>
>>>> Dear all,
>>>>
>>>> I don't want to argue with anybody about words or about what
>>>> bootstrap is suitable for - I know too little for that.
>>>>
>>>> All I need is help to get the *equation coefficients* optimized by
>>>> bootstrap - either by one of the functions or by simple median.
>>>>
>>>> Please help,
>>>>
>>>> --
>>>> Michal J. Figurski
>>>> HUP, Pathology & Laboratory Medicine
>>>> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce St. 7
>>>> Maloney Philadelphia, PA 19104 tel. (215) 662-3413
>>>>
>>>> Frank E Harrell Jr wrote:
>>>>> Michal Figurski wrote:
>>>>>> Frank,
>>>>>>
>>>>>> "How does bootstrap improve on that?"
>>>>>>
>>>>>> I don't know, but I have an idea. Since the data in my set
>>>> are just a
>>>>>> small sample of a big population, then if I use my whole
>>>> dataset to
>>>>>> obtain max likelihood estimates, these estimates may be
>>>> best for this
>>>>>> dataset, but far from ideal for the whole population.
>>>>> The bootstrap, being a resampling procedure from your
>>>> sample, has the
>>>>> same issues about the population as MLEs.
>>>>>
>>>>>> I used bootstrap to virtually increase the size of my dataset, it
>>>>>> should result in estimates more close to that from the
>>>> population -
>>>>>> isn't it the purpose of bootstrap?
>>>>> No
>>>>>
>>>>>> When I use such median coefficients on another dataset (another
>>>>>> sample from population), the predictions are better, than
>>>> using max
>>>>>> likelihood estimates. I have already tested that and it worked!
>>>>> Then your testing procedure is probably not valid.
>>>>>
>>>>>> I am not a statistician and I don't feel what
>>>> "overfitting" is, but
>>>>>> it may be just another word for the same idea.
>>>>>>
>>>>>> Nevertheless, I would still like to know how can I get the
>>>>>> coeffcients for the model that gives the "nearly unbiased
>>>> estimates".
>>>>>> I greatly appreciate your help.
>>>>> More info in my book Regression Modeling Strategies.
>>>>>
>>>>> Frank
>>>>>
>>>>>> --
>>>>>> Michal J. Figurski
>>>>>> HUP, Pathology & Laboratory Medicine
>>>>>> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce St. 7
>>>>>> Maloney Philadelphia, PA 19104 tel. (215) 662-3413
>>>>>>
>>>>>> Frank E Harrell Jr wrote:
>>>>>>> Michal Figurski wrote:
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>> I am trying to optimize my logistic regression model by using
>>>>>>>> bootstrap. I was previously using SAS for this kind of
>>>> tasks, but I
>>>>>>>> am now switching to R.
>>>>>>>>
>>>>>>>> My data frame consists of 5 columns and has 109 rows.
>>>> Each row is a
>>>>>>>> single record composed of the following values: Subject_name,
>>>>>>>> numeric1, numeric2, numeric3 and outcome (yes or no). All three
>>>>>>>> numerics are used to predict outcome using LR.
>>>>>>>>
>>>>>>>> In SAS I have written a macro, that was splitting the dataset,
>>>>>>>> running LR on one half of data and making predictions on second
>>>>>>>> half. Then it was collecting the equation coefficients from each
>>>>>>>> iteration of bootstrap. Later I was just taking medians of these
>>>>>>>> coefficients from all iterations, and used them as an
>>>> optimal model
>>>>>>>> - it really worked well!
>>>>>>> Why not use maximum likelihood estimation, i.e., the coefficients
>>>>>>> from the original fit. How does the bootstrap improve on that?
>>>>>>>
>>>>>>>> Now I want to do the same in R. I tried to use the 'validate' or
>>>>>>>> 'calibrate' functions from package "Design", and I also
>>>>>>>> experimented with function 'sm.binomial.bootstrap' from package
>>>>>>>> "sm". I tried also the function 'boot' from package
>>>> "boot", though
>>>>>>>> without success
>>>>>>>> - in my case it randomly selected _columns_ from my data frame,
>>>>>>>> while I wanted it to select _rows_.
>>>>>>> validate and calibrate in Design do resampling on the rows
>>>>>>>
>>>>>>> Resampling is mainly used to get a nearly unbiased
>>>> estimate of the
>>>>>>> model performance, i.e., to correct for overfitting.
>>>>>>>
>>>>>>> Frank Harrell
>>>>>>>
>>>>>>>> Though the main point here is the optimized LR equation. I would
>>>>>>>> appreciate any help on how to extract the LR equation
>>>> coefficients
>>>>>>>> from any of these bootstrap functions, in the same form
>>>> as given by
>>>>>>>> 'glm' or 'lrm'.
>>>>>>>>
>>>>>>>> Many thanks in advance!
>>>>>>>>
More information about the R-help
mailing list