[R] Coefficients of Logistic Regression from bootstrap - how to get them?
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Mon Jul 21 20:41:10 CEST 2008
Michal Figurski wrote:
> Hello all,
> I am trying to optimize my logistic regression model by using bootstrap.
> I was previously using SAS for this kind of tasks, but I am now
> switching to R.
> My data frame consists of 5 columns and has 109 rows. Each row is a
> single record composed of the following values: Subject_name, numeric1,
> numeric2, numeric3 and outcome (yes or no). All three numerics are used
> to predict outcome using LR.
> In SAS I have written a macro, that was splitting the dataset, running
> LR on one half of data and making predictions on second half. Then it
> was collecting the equation coefficients from each iteration of
> bootstrap. Later I was just taking medians of these coefficients from
> all iterations, and used them as an optimal model - it really worked well!
Why not use maximum likelihood estimation, i.e., the coefficients from
the original fit. How does the bootstrap improve on that?
> Now I want to do the same in R. I tried to use the 'validate' or
> 'calibrate' functions from package "Design", and I also experimented
> with function 'sm.binomial.bootstrap' from package "sm". I tried also
> the function 'boot' from package "boot", though without success - in my
> case it randomly selected _columns_ from my data frame, while I wanted
> it to select _rows_.
validate and calibrate in Design do resampling on the rows
Resampling is mainly used to get a nearly unbiased estimate of the model
performance, i.e., to correct for overfitting.
> Though the main point here is the optimized LR equation. I would
> appreciate any help on how to extract the LR equation coefficients from
> any of these bootstrap functions, in the same form as given by 'glm' or
> Many thanks in advance!
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help