[R] Coefficients of Logistic Regression from bootstrap - how to get them?

Frank E Harrell Jr f.harrell at vanderbilt.edu
Mon Jul 21 20:41:10 CEST 2008


Michal Figurski wrote:
> Hello all,
> 
> I am trying to optimize my logistic regression model by using bootstrap. 
> I was previously using SAS for this kind of tasks, but I am now 
> switching to R.
> 
> My data frame consists of 5 columns and has 109 rows. Each row is a 
> single record composed of the following values: Subject_name, numeric1, 
> numeric2, numeric3 and outcome (yes or no). All three numerics are used 
> to predict outcome using LR.
> 
> In SAS I have written a macro, that was splitting the dataset, running 
> LR on one half of data and making predictions on second half. Then it 
> was collecting the equation coefficients from each iteration of 
> bootstrap. Later I was just taking medians of these coefficients from 
> all iterations, and used them as an optimal model - it really worked well!

Why not use maximum likelihood estimation, i.e., the coefficients from 
the original fit.  How does the bootstrap improve on that?

> 
> Now I want to do the same in R. I tried to use the 'validate' or 
> 'calibrate' functions from package "Design", and I also experimented 
> with function 'sm.binomial.bootstrap' from package "sm". I tried also 
> the function 'boot' from package "boot", though without success - in my 
> case it randomly selected _columns_ from my data frame, while I wanted 
> it to select _rows_.

validate and calibrate in Design do resampling on the rows

Resampling is mainly used to get a nearly unbiased estimate of the model 
performance, i.e., to correct for overfitting.

Frank Harrell

> 
> Though the main point here is the optimized LR equation. I would 
> appreciate any help on how to extract the LR equation coefficients from 
> any of these bootstrap functions, in the same form as given by 'glm' or 
> 'lrm'.
> 
> Many thanks in advance!
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list