[R] a Bootstrap understanding problem
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Jan 21 15:12:45 CET 2002
On 21 Jan 2002, Wilhelm B. Kloke wrote:
> In article <ifado.list.r.help/Pine.LNX.4.31.0201211014160.15847-100000 at gannet.stats>,
> Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> >On Mon, 21 Jan 2002, Wilhelm B. Kloke wrote:
> >
> >It certainly does not conform. The `bootstrap' package (its original S
> >name was bootstrap.funs) is old and I suggest should not now be used, but
> >it does have a function for BCa which you could find by looking in its
> >INDEX. The example is even
>
> Which, BTW, yielded results resembling those I hoped to look for.
>
> >
> > # For example, find bca limits for
> > # the correlation coefficient from a set of 15 data pairs:
> >
> >but the bootstrap set is tiny (see below).
>
> As the data set is really tiny, I can give it here:
> V4 V5
> 1 -0.02 -0.07
> 2 0.04 0.02
> 3 -0.02 0.04
> 4 0.08 -0.02
> 5 -0.01 0.04
> 6 0.08 0.07
> 7 0.03 0.04
> 8 0.08 0.01
> 9 0.03 0.03
> 10 -0.12 -0.03
> 11 0.06 0.04
> 12 -0.21 -0.08
> 13 0.00 -0.01
Looks like a discrete distribution, which makes bootstrapping pretty
questionable.
> My boot application gives:
> : > mehnert.boot
> :
> : ORDINARY NONPARAMETRIC BOOTSTRAP
> :
> :
> : Call:
> : boot(data = mehnert, statistic = function(x, i) {
> : cor(x[i, 1], x[i, 2])
> : }, R = 1000)
> :
> :
> : Bootstrap Statistics :
> : original bias std. error
> : t1* 0.6623205 -0.03803166 0.2197617
> : >
> and
> : > boot.ci(mehnert.boot)
> : BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
> : Based on 1000 bootstrap replicates
> :
> : CALL :
> : boot.ci(boot.out = mehnert.boot)
> :
> : Intervals :
> : Level Normal Basic
> : 95% ( 0.2696, 1.1311 ) ( 0.4220, 1.2838 )
> :
> : Level Percentile BCa
> : 95% ( 0.0408, 0.9027 ) ( 0.0322, 0.8962 )
> : Calculations and Intervals on Original Scale
> : Warning message:
> : Bootstrap variances needed for studentized intervals in: boot.ci(mehnert.boot)
> My question was raised by the fact that in Mehnert's writeup I found
> BCa ci from 0.16 to 0.93 for 5%level, which may indicate some more
> confidence for assuming the correlation to be positive.
Ah, but that's a hypothesis test, not a confidence limit. In any case
0.03 > 0, so the conclusion would be exactly the same. If you want to
test for non-zero correlation, use a hypothesis test. And you cannot
deduce that because 0.03 is nearer 0 than 0.16, the P-values would be any
different. You only know about 95% CIs and hence 5% tests.
> >No, and in e.g. the MASS examples they give similar results.
>
> Indeed. I saw that.
>
> >BCa needs large, often very large (tens of thousands), bootstrap sets.
> >Are you sure your colleague used a large enough set? A quick bit of
>
> We cannot make more observations without difficulty. We have these
> data from 13 probands. For the bootstrap simulation we used 1000 both
> in the original study and in my replication trial.
>
> >replication suggests that the BCa limits are very variable for your
> >problem. I find BCa pretty unreliable, and for correlations using Fisher's
> >tanh transformation is normally enough to make all sensible confidence
> >interval procedures agree for all practical purposes.
I see you did not pick up on that: it was a very important comment.
The appropriate scale is known: don't try to estimate it from 13 pairs.
> >Finally, what useful conclusions can be drawn from a confidence interval
> >for the correlation of 13 data pairs?
>
> Of course, this is not a bad question. But aren't bootstrap methods
> designed for application to problematic datasets?
No. First, they are frequentist, so they are designed for applications
which little is known about the *distributions*, but not for any one
dataset. Second, I don't think you can draw any useful conclusion from a
CI of (0.16, 0.93) except as a roundabout way to test rho > 0, and for
that you can gain more information from a test statistic and P-value.
Lots more can be said, but for example if you don't believe in normality
in your problem, you should be using an appropriate statistic (not the
product-moment correlation) and not just trying to fix the distribution of
the statistic. It is robust statistics that is designed for non-standard
problems, not bootstrapping, but the first has been vastly undersold
relative to the second.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list