[R] jackknife-after-bootstrap

Tim Hesterberg timhesterberg at gmail.com
Mon Nov 15 16:08:03 CET 2010


>Can someone help me about detection of outliers using jackknife after
>bootstrap algorithm?

A simple procedure is to calculate the mean of the bootstrap
statistics for all bootstrap samples that omit the first of the
original observations.  Repeat for the second, third, ... original
observation.  You now have $n$ means, and can look at these for
outliers.

A similar approach is to calculate means of bootstrap statistics
for samples that <include> (rather than omit) each of the the original 
observations.

Both of those approaches can suffer from considerable random variability.
Provided the number of bootstrap samples is large, a better approach
is to use linear regression, where
	y = the vector of bootstrap statistics, length R
	X = R by n matrix, with X[i, j] = the number of times
		original observation j is included in bootstrap sample i
and without an intercept.  The $n$ regression coefficients give estimates
of the influence of the original observations, and you can look for outliers
in these influence estimates.

For comparison, the first simple procedure above corresponds to
taking averages of y for rows with X[, j] = 0, and the "similar approach"
to averaging y for rows with X[, j] > 0.

For further discussion see
Hesterberg, Tim C. (1995), "Tail-Specific Linear Approximations for Efficient Bootstrap Simulations", Journal of Computational and Graphical Statistics, 4(2), 113-133.

Hesterberg, Tim C. and Stephen J. Ellis (1999), "Linear Approximations for Functional Statistics in Large-Sample Applications", Technical Report No. 86, Research Department, MathSoft, Inc., 1700 Westlake Ave. N., Suite 500, Seattle, WA 98109.
http://home.comcast.net/~timhesterberg/articles/tech86-linear.pdf



More information about the R-help mailing list