[R] Interpreting Results of Bootstrapping

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Sun Jul 11 11:40:34 CEST 2004


Hi!

Simply plot(x1,x2): you will see that there is one point
(number 23) at (x1,x2) = (25.34,6.744) which is a very
long way from all the other points (which, among themselves,
form a somewhat diffuse cluster with some suggestion of
further structure).

When you bootstrap, the correlation you obtain in any sample
will depend on whether or not this outlying point is included
in the sample. If it is included, this single point will generate
a relatively high value of the correlation coefficient simply
because it is such a long way from all the others (i.e. it is
highly influential).

If it is not included, then the diffuse character of the other
points will generate a very low value of the correlation
coefficient.

  > cor(x1,x2)
  [1] 0.7471931
  > cor(x1[-23],x2[-23])
  [1] 0.03914653

Therefore your bootstrap distribution will have two peaks: one
peak, around 0.75, corresponding to the bootstrap samples which
include this outlying point, and the other, around 0, corresponding
to the bootstrap samples which do not include it.

This is the explanation and, at the same time, the interpretation.

Best wishes,
Ted.

On 11-Jul-04 Y C Tao wrote:
> I tried to bootstrap the correlation between two
> variables x1 and x2. The resulting distribution has
> two distinct peaks, how should I interprete it?
> 
> The original code is attached.
> 
> Y. C. Tao
> 
> ----------------
> 
> library(boot);
>  
> my.correl<-function(d, i) cor(d[i,1], d[i,2])
>  
> x1<-c(-2.612,-0.7859,-0.5229,-1.246,1.647,1.647,0.1811,
>       -0.07097,0.8711,0.4323,0.1721,2.143,4.33,0.5002,
>        0.4015,-0.5225,2.538,0.07959,-0.6645,4.521,-1.371,
>        0.3327,25.24,-0.5417,2.094,0.6064,-0.4476,-0.5891,
>       -0.08879,-0.9487,-2.459e-05,-0.03887,0.2116,-0.0625,1.555,
>        0.2069,-0.2142,-0.807,-0.6499,2.384,-0.02063,1.179,
>       -0.0003586,-1.408,0.6928,0.689,0.1854,0.4351,0.5663,
>        0.07171,-0.07004);
>  
> x2<-c( 0.08742,0.2555,-0.00337,0.03995,-1.208,-1.208,-0.001374,
>       -1.282,1.341,-0.9069,-0.2011,1.557,0.4517,-0.4376,
>        0.4747,0.04965,-0.1668,-0.6811,-0.7011,-1.457,0.04652,
>       -1.117,6.744,-1.332,0.1327,-0.1479,-2.303,0.1235,      
>        0.5916,0.05018,-0.7811,0.5869,-0.02608,0.9594,-0.1392,
>        0.4089,0.1468,-1.507,-0.6882,-0.1781,0.5434,-0.4957,
>        0.02557,-1.406,-0.5053,-0.7345,-1.314,0.3178,-0.2108,
>        0.4186,-0.03347);
>  
> b<-boot(cbind(x1, x2), my.correl, 2000)
> hist(b$t, breaks=50)

[The above rearranged to have 7 values in each conplete line]



--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 167 1972
Date: 11-Jul-04                                       Time: 10:40:34
------------------------------ XFMail ------------------------------




More information about the R-help mailing list