[R-sig-eco] meaning of extreme outliers in bootstrapped statistic
Tom Elliott
tnelliott at gmail.com
Sun Jun 13 02:56:06 CEST 2010
Hello all-
I'm using the boot function (package boot) to generate bootstrapped
regression coefficients, using rlm (MASS) as the regression model.
I've run the same bootstrap (different seeds) several times, and for
some runs the Q-Q plots of the bootstrapped coefficients look
reasonable except for 1 to 3 very extreme outliers, with values more
than twice as large as the mean. If I remove the outliers and replot,
it looks fine. And for some runs, the outliers don't show up at all.
The coefficients, bias, and standard error are almost identical for
all runs.
I don't know how to interpret this behavior, and whether it means I
should regard the bootstrapped output as completely suspect or not.
Any suggestions would be appreciated.
I used weighted and strata to account for plots with different sample
sizes and some spatial clustering.
#the first few rows of the data, number of observations =45
> sex.data2[1:5,]
p.male ln.den av.est nu.tot Cluster
0.4500000 -2.535779 20.269 25 1
0.3846154 -3.423443 15.332 24 2
0.8461538 -3.003764 40.628 20 3
0.3571429 -2.440698 21.082 22 3
0.7222222 -4.772406 61.931 18 4
#And the code I used:
boot.male<-function(data,indices,maxit=20){
data<-data[indices,]
mod<-rlm(p.male~ln.den+av.est+ln.den:av.est,method="MM",weights=nu.tot,data=data,maxit=maxit)
coefficients(mod)
}
male.mod3<-boot(sex.data2,boot.male,9999,
weights=nu.tot,strata=Cluster, maxit=100)
> male.mod
Call:
boot(data = sex.data2, statistic = boot.male, R = 9999, strata = Cluster,
weights = nu.tot, maxit = 100)
Bootstrap Statistics :
original bias std. error mean(t*)
t1* -0.823268399 -1.002848e-02 0.249722760 -0.794159753
t2* -0.376047283 -4.069051e-03 0.072401520 -0.370784139
t3* 0.032714183 1.710077e-05 0.008641502 0.031744414
t4* 0.008630942 4.208041e-05 0.002053623 0.008473436
>
# this run produced no outliers, but the following run did:
> male.mod4<-boot(sex.data2,boot.male,9999, weights=nu.tot,strata=Cluster, maxit=100)
> summary(male.mod4$t[,2])
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.6597 -0.4172 -0.3694 -0.3708 -0.3237 1.2090
> sort(male.mod4$t[,2],decreasing=T)[1:5]
[1] 1.20924783 0.96981415 0.83783851 -0.08805377 -0.13783204
Thanks,
Tom
More information about the R-sig-ecology
mailing list