[R-sig-eco] meaning of extreme outliers in bootstrapped statistic

Tom Elliott tnelliott at gmail.com
Sun Jun 13 02:56:06 CEST 2010


Hello all-

I'm using the boot function (package boot) to generate bootstrapped
regression coefficients, using rlm (MASS) as the regression model.
I've run the same bootstrap (different seeds) several times, and for
some runs the Q-Q plots of the bootstrapped coefficients look
reasonable except for 1 to 3 very extreme outliers, with values more
than twice as large as the mean. If I remove the outliers and replot,
it looks fine. And for some runs, the outliers don't show up at all.
The coefficients, bias, and standard error are almost identical for
all runs.

I don't know how to interpret this behavior, and whether it means I
should regard the bootstrapped output as completely suspect or not.
Any suggestions would be appreciated.

I used weighted and strata to account for plots with different sample
sizes and some spatial clustering.

#the first few rows of the data, number of observations =45

> sex.data2[1:5,]
p.male    ln.den av.est nu.tot Cluster
0.4500000 -2.535779 20.269     25       1
0.3846154 -3.423443 15.332     24       2
0.8461538 -3.003764 40.628     20       3
0.3571429 -2.440698 21.082     22       3
0.7222222 -4.772406 61.931     18       4

#And the code I used:

boot.male<-function(data,indices,maxit=20){
	data<-data[indices,]
	mod<-rlm(p.male~ln.den+av.est+ln.den:av.est,method="MM",weights=nu.tot,data=data,maxit=maxit)
	coefficients(mod)
	}

male.mod3<-boot(sex.data2,boot.male,9999,
weights=nu.tot,strata=Cluster, maxit=100)
> male.mod

Call:
boot(data = sex.data2, statistic = boot.male, R = 9999, strata = Cluster,
    weights = nu.tot, maxit = 100)

Bootstrap Statistics :
        original        bias    std. error     mean(t*)
t1* -0.823268399 -1.002848e-02 0.249722760 -0.794159753
t2* -0.376047283 -4.069051e-03 0.072401520 -0.370784139
t3*  0.032714183  1.710077e-05 0.008641502  0.031744414
t4*  0.008630942  4.208041e-05 0.002053623  0.008473436
>
# this run produced no outliers, but the following run did:

> male.mod4<-boot(sex.data2,boot.male,9999, weights=nu.tot,strata=Cluster, maxit=100)

> summary(male.mod4$t[,2])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-0.6597 -0.4172 -0.3694 -0.3708 -0.3237  1.2090

> sort(male.mod4$t[,2],decreasing=T)[1:5]
[1]  1.20924783  0.96981415  0.83783851 -0.08805377 -0.13783204

Thanks,
Tom



More information about the R-sig-ecology mailing list