[R] a Bootstrap understanding problem

Mon Jan 21 15:12:45 CET 2002

On 21 Jan 2002, Wilhelm B. Kloke wrote:

> In article <ifado.list.r.help/Pine.LNX.4.31.0201211014160.15847-100000 at gannet.stats>,
> Prof Brian Ripley  <ripley at stats.ox.ac.uk> wrote:
> >On Mon, 21 Jan 2002, Wilhelm B. Kloke wrote:
> >
> >It certainly does not conform.  The `bootstrap' package (its original S
> >name was bootstrap.funs) is old and I suggest should not now be used, but
> >it does have a function for BCa which you could find by looking in its
> >INDEX.  The example is even
>
> Which, BTW, yielded results resembling those I hoped to look for.
>
> >
> >     # For example, find bca limits for
> >     # the correlation coefficient from a set of 15 data pairs:
> >
> >but the bootstrap set is tiny (see below).
>
> As the data set is really tiny, I can give it here:
>       V4    V5
> 1  -0.02 -0.07
> 2   0.04  0.02
> 3  -0.02  0.04
> 4   0.08 -0.02
> 5  -0.01  0.04
> 6   0.08  0.07
> 7   0.03  0.04
> 8   0.08  0.01
> 9   0.03  0.03
> 10 -0.12 -0.03
> 11  0.06  0.04
> 12 -0.21 -0.08
> 13  0.00 -0.01

Looks like a discrete distribution, which makes bootstrapping pretty
questionable.

> My boot application gives:
> : > mehnert.boot
> :
> : ORDINARY NONPARAMETRIC BOOTSTRAP
> :
> :
> : Call:
> : boot(data = mehnert, statistic = function(x, i) {
> :     cor(x[i, 1], x[i, 2])
> : }, R = 1000)
> :
> :
> : Bootstrap Statistics :
> :      original      bias    std. error
> : t1* 0.6623205 -0.03803166   0.2197617
> : >
> and
> : > boot.ci(mehnert.boot)
> : BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
> : Based on 1000 bootstrap replicates
> :
> : CALL :
> : boot.ci(boot.out = mehnert.boot)
> :
> : Intervals :
> : Level      Normal              Basic
> : 95%   ( 0.2696,  1.1311 )   ( 0.4220,  1.2838 )
> :
> : Level     Percentile            BCa
> : 95%   ( 0.0408,  0.9027 )   ( 0.0322,  0.8962 )
> : Calculations and Intervals on Original Scale
> : Warning message:
> : Bootstrap variances needed for studentized intervals in: boot.ci(mehnert.boot)
> My question was raised by the fact that in Mehnert's writeup I found
> BCa ci from 0.16 to 0.93 for 5%level, which may indicate some more
> confidence for assuming the correlation to be positive.

Ah, but that's a hypothesis test, not a confidence limit.  In any case
0.03 > 0, so the conclusion would be exactly the same.  If you want to
test for non-zero correlation, use a hypothesis test.  And you cannot
deduce that because 0.03 is nearer 0 than 0.16, the P-values would be any
different. You only know about 95% CIs and hence 5% tests.

> >No, and in e.g. the MASS examples they give similar results.
>
> Indeed. I saw that.
>
> >BCa needs large, often very large (tens of thousands), bootstrap sets.
> >Are you sure your colleague used a large enough set?  A quick bit of
>
> We cannot make more observations without difficulty. We have these
> data from 13 probands. For the bootstrap simulation we used 1000 both
> in the original study and in my replication trial.
>
> >replication suggests that the BCa limits are very variable for your
> >problem. I find BCa pretty unreliable, and for correlations using Fisher's
> >tanh transformation is normally enough to make all sensible confidence
> >interval procedures agree for all practical purposes.

I see you did not pick up on that: it was a very important comment.
The appropriate scale is known: don't try to estimate it from 13 pairs.

> >Finally, what useful conclusions can be drawn from a confidence interval
> >for the correlation of 13 data pairs?
>
> Of course, this is not a bad question. But aren't bootstrap methods
> designed for application to problematic datasets?

No. First, they are frequentist, so they are designed for applications
which little is known about the *distributions*, but not for any one
dataset.  Second, I don't think you can draw any useful conclusion from a
CI of (0.16, 0.93) except as a roundabout way to test rho > 0, and for
that you can gain more information from a test statistic and P-value.

Lots more can be said, but for example if you don't believe in normality
in your problem, you should be using an appropriate statistic (not the
product-moment correlation) and not just trying to fix the distribution of
the statistic.  It is robust statistics that is designed for non-standard
problems, not bootstrapping, but the first has been vastly undersold
relative to the second.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._