[R-sig-eco] meaning of extreme outliers in bootstrapped statistic

Nicholas Lewin-Koh nikko at hailmail.net
Mon Jun 14 06:56:14 CEST 2010


Hi Tom,
You have 45 observations, and remember the bootstrap samples with
replacement.
If you have a few extreme data points it is entirely likely that some of
your parameters
may have some extreme cases that arise from rare bootstrap samples where
the extreme data 
appears more than once, especially since you have an interaction in your
model.
so the interaction coefficients are estimated from a smaller subset of
the data.
I would worry if the bootstrap is indicating a skewed distribution of
the parameter,
which may be moderated by transforming the predictor.

Nicholas
> Date: Sat, 12 Jun 2010 17:56:06 -0700
> From: Tom Elliott <tnelliott at gmail.com>
> To: r-sig-ecology at r-project.org
> Subject: [R-sig-eco] meaning of extreme outliers in bootstrapped
> 	statistic
> Message-ID:
> 	<AANLkTinO6cLbMRXYApYa50bf-Z5rdd8tpMl3yot2F_a9 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Hello all-
> 
> I'm using the boot function (package boot) to generate bootstrapped
> regression coefficients, using rlm (MASS) as the regression model.
> I've run the same bootstrap (different seeds) several times, and for
> some runs the Q-Q plots of the bootstrapped coefficients look
> reasonable except for 1 to 3 very extreme outliers, with values more
> than twice as large as the mean. If I remove the outliers and replot,
> it looks fine. And for some runs, the outliers don't show up at all.
> The coefficients, bias, and standard error are almost identical for
> all runs.
> 
> I don't know how to interpret this behavior, and whether it means I
> should regard the bootstrapped output as completely suspect or not.
> Any suggestions would be appreciated.
> 
> I used weighted and strata to account for plots with different sample
> sizes and some spatial clustering.
> 
> #the first few rows of the data, number of observations =45
> 
> > sex.data2[1:5,]
> p.male    ln.den av.est nu.tot Cluster
> 0.4500000 -2.535779 20.269     25       1
> 0.3846154 -3.423443 15.332     24       2
> 0.8461538 -3.003764 40.628     20       3
> 0.3571429 -2.440698 21.082     22       3
> 0.7222222 -4.772406 61.931     18       4
> 
> #And the code I used:
> 
> boot.male<-function(data,indices,maxit=20){
> 	data<-data[indices,]
> 	mod<-rlm(p.male~ln.den+av.est+ln.den:av.est,method="MM",weights=nu.tot,data=data,maxit=maxit)
> 	coefficients(mod)
> 	}
> 
> male.mod3<-boot(sex.data2,boot.male,9999,
> weights=nu.tot,strata=Cluster, maxit=100)
> > male.mod
> 
> Call:
> boot(data = sex.data2, statistic = boot.male, R = 9999, strata = Cluster,
>     weights = nu.tot, maxit = 100)
> 
> Bootstrap Statistics :
>         original        bias    std. error     mean(t*)
> t1* -0.823268399 -1.002848e-02 0.249722760 -0.794159753
> t2* -0.376047283 -4.069051e-03 0.072401520 -0.370784139
> t3*  0.032714183  1.710077e-05 0.008641502  0.031744414
> t4*  0.008630942  4.208041e-05 0.002053623  0.008473436
> >
> # this run produced no outliers, but the following run did:
> 
> > male.mod4<-boot(sex.data2,boot.male,9999, weights=nu.tot,strata=Cluster, maxit=100)
> 
> > summary(male.mod4$t[,2])
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> -0.6597 -0.4172 -0.3694 -0.3708 -0.3237  1.2090
> 
> > sort(male.mod4$t[,2],decreasing=T)[1:5]
> [1]  1.20924783  0.96981415  0.83783851 -0.08805377 -0.13783204
> 
> Thanks,
> Tom
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> 
> 
> End of R-sig-ecology Digest, Vol 27, Issue 9
> ********************************************
>



More information about the R-sig-ecology mailing list