[R] memory and bootstrapping

E Hofstadler e.hofstadler at gmail.com
Thu May 5 09:08:51 CEST 2011


hello,

the following questions will without doubt reveal some fundamental
ignorance, but hopefully you can still help me out.

I'd like to bootstrap a coefficient gained on the basis of the
coefficients in a logistic regression model (the mean differences in
the predicted probabilities between two groups, where each predict()
operation uses as the newdata-argument a dataframe of equal size as
the original dataframe).I've got 130,000 rows and 7 columns in my
dataframe. The glm-model uses all variables (as well as two 2-way
interactions).

System:
- R-version: 2.12.2
- OS: Windows XP Pro, 32-bit
- 3.16Ghz intel dual core processor, 2.9GB RAM

I'm using the boot package to arrive at the standard errors for this
difference, but even with only 10 replications, this takes quite a
long time: 216 seconds (perhaps this is partly also due to my
inefficiently programmed function underlying the boot-call, I'm also
looking into that).

I wanted to try out calculating a bca-bootstrapped confidence
interval, which as I understand requires a lot more replications than
normal-theory intervals. Drawing on John Fox' Appendix to his "An R
Companion to Applied Regression", I was thinking of trying out 2000
replications -- but this will take several hours to compute on my
system (which isn't in itself a major issue though).

My Questions:
- let's say I try bootstrapping with 2000 replications. Can I be
certain that the memory available to R  will be sufficient for this
operation?
- (this relates to statistics more generally): is it a good idea in
your opinion to try bca-bootstrapping, or can it be assumed that a
normal theory confidence interval will be a sufficiently good
approximation (letting me get away with, say, 500 replications)?


Best,
Esther



More information about the R-help mailing list