[R] small sample techniques

Tim Hesterberg timh at insightful.com
Wed Aug 8 19:40:52 CEST 2007


About using t tests and confidence intervals for "large" samples -
"large" may need to be very large.
The old pre-computer-age rule of n >= 30 is inadequate.

For example, for an exponential distribution, the actual size
of a nominal 2.5% one-sided t-test is not accurate to within 10%
(i.e. between 2.25% & 2.75%) until n is around 5000.
The error (actual - nominal size) decreases very slowly, at the rate 1/sqrt(n).

In practice, real distributions may be even more skewed than
the exponential distribution, even though they appear less skewed,
if they have long tails.  In this case the sample size would need
to be even larger for t procedures to be reasonably accurate.

An alternative is to use bootstrapping.  Bootstrap procedures that
decrease at the rate 1/n include bootstrap t, BCa, and bootstrap
tilting.

Moshe Olshansky <m_olshansky at yahoo.com> wrote:
>If the two populations are normal the t-test gives you
>the exact result for whatever the sample size is (the
>sample size will affect the number of degrees of
>freedom).
>When the populations are not normal and the sample
>size is large it is still OK to use t-test (because of
>the Central Limit Theorem) but this is not necessarily
>true for the small sample size.
>You could use simulation to find the relevant
>probabilities.
>...

========================================================
| Tim Hesterberg       Senior Research Scientist       |
| timh at insightful.com  Insightful Corp.                |
| (206)802-2319        1700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax)  Seattle, WA 98109-3044, U.S.A.  |
|                      www.insightful.com/Hesterberg   |
========================================================
Short course - Bootstrap Methods and Permutation Tests
		Oct 10-11 San Francisco, 3-4 Oct UK.
http://www.insightful.com/services/training.asp



More information about the R-help mailing list