[R] Coefficients of Logistic Regression from bootstrap - how to get them?

Gustaf Rydevik gustaf.rydevik at gmail.com
Thu Jul 31 17:27:07 CEST 2008


On Thu, Jul 31, 2008 at 4:30 PM, Michal Figurski
<figurski at mail.med.upenn.edu> wrote:
> Frank and all,
>
> The point you were looking for was in a page that was linked from the
> referenced page - I apologize for confusion. Please take a look at the two
> last paragraphs here:
> http://people.revoledu.com/kardi/tutorial/Bootstrap/examples.htm
>
> Though, possibly it's my ignorance, maybe it's yours, but you actually
> missed the important point again. It is that you just don't estimate mean,
> or CI, or variance on PK profile data! It is as if you were trying to
> estimate mean, CI and variance of a "Toccata_&_Fugue_in_D_minor.wav" file.
> What for? The point is in the music! Would the mean or CI or variance tell
> you anything about that? Besides, everybody knows the variance (or
> variability?) is there and can estimate it without spending time on
> calculations.
> What I am trying to do is comparable to compressing a wave into mp3 - to
> predict the wave using as few data points as possible. I have a bunch of
> similar waves and I'm trying to find a common equation to predict them all.
> I am *not* looking for the variance of the mean!
>
> I could be wrong (though it seems less and less likely), but you keep
> talking about the same irrelevant parameters (CI, variance) on and on. Well,
> yes - we are at a standstill, but not because of Davison & Hinkley's book. I
> can try reading it, though as I stated above, it is not even "remotely
> related" to what I am trying to do. I'll skip it then - life is too short.
>
> Nevertheless I thank you (all) for relevant criticism on the procedure (in
> the points where it was relevant). I plan to use this methodology further,
> and it was good to find out that it withstood your criticism. I will look
> into the penalized methods, though.
>
> --
> Michal J. Figurski
>

I take it you mean the sentence:

" For example, in here, the statistical estimator is  the sample mean.
Using bootstrap sampling, you can do beyond your statistical
estimators. You can now get even the distribution of your estimator
and the statistics (such as confidence interval, variance) of your
estimator."

Again you are misinterpreting text. The phrase about "doing beyond
your statistical estimators", is explained in the next sentence, where
he says that using bootstrap gives you information about the mean
*estimator* (and not more information about the population mean).
And since you're not interested in this information, in your case
bootstrap/resampling is not useful at all.

As another example of misinterpretation: In your email from  a week
ago, it sounds like you believe that the authors of the original paper
are trying to improve on a fixed model
Figurski:
"Regarding the "multiple stepwise regression" - according to the cited
SPSS manual, there are 5 options to select from. I don't think they used
'stepwise selection' option, because their models were already
pre-defined. Variables were pre-selected based on knowledge of
pharmacokinetics of this drug and other factors. I think this part I
understand pretty well."

This paragraph is wrong. Sorry, no way around it.

Quoting from the paper Pawinski etal:
"  *__Twenty-six____(!)*     1-, 2-, or 3-sample estimation
models were fit (r2  0.341– 0.862) to a randomly
selected subset of the profiles using linear regression
and were used to estimate AUC0–12h for the profiles not
included in the regression fit, comparing those estimates
with the corresponding AUC0–12h values, calculated
with the linear trapezoidal rule, including all 12
timed MPA concentrations. The 3-sample models were
constrained to include no samples past 2 h."
(emph. mine)

They clearly state that they are choosing among 26 different models by
using their bootstrap-like procedure, not improving on a single,
predefined model.
This procedure is statistically sound (more or less at least), and not
controversial.

However, (again) what you are wanting to do is *not* what they did in
their paper!
resampling can not improve on the performance of a pre-specified
model. This is intuitively obvious, but moreover its mathematically
provable! That's why we're so certain of our standpoint. If you really
wish, I (or someone else) could write out a proof, but I'm unsure if
you would be able to follow.

In the end, it doesn't really matter. What you are doing amounts to
doing a regression 50 times, when once would suffice. No big harm
done, just a bit of unnecessary work. And proof to a statistically
competent reviewer that you don't really understand what you're doing.
The better option would be to either study some more statistics
yourself, or find a statistician that can do your analysis for you,
and trust him to do it right.

Anyhow, good luck with your research.

Best regards,

Gustaf

-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik



More information about the R-help mailing list