[R-sig-teaching] I need your thoughts on teaching with R

Tue Mar 31 20:11:10 CEST 2009

> I tend to use t-tests after examining normal probability plots and,
> possibly, considering transformation.  I believe they would be more
> powerful than permutation tests but that may be incorrect.  Can you
> describe situations in which you would prefer permutation tests to
> t-tests?

Here is the reason I prefer permutation tests, besides the conceptual 
simplicity.  T-tests are based on a normal sampling model, and my 
perception is that very few data sets to which people apply t-tests 
actually arise from random samples.  Here I mean specifically that the 
person gathering data used a genuine random sampling method to select 
observations from a population. Survey samples would be an exception.  I 
think it is far more frequently the case that a study takes whatever 
units/subjects are at hand and separates these into groups either through 
random assignment or based on some categorical variable.  The permutation 
test directly answers the question about how a measured difference between 
group averages may have been different than if the groups had been formed 
in a different way.  Then, instead of making the objectional argument that 
"we will treat the data as if it were a representative sample from the 
population of interest", and then using a t-test justified by a false 
random sampling argument and making inferences to some larger population, 
I find it much more justifiable to model the randomness that was truly 
part of the data gathering (random assignment).  Any inference to other 
populations is then justified on the basis of background information (the 
groups I am interested in are similar to the groups in the study, so maybe 
the results there apply here too) and not by random sampling.  It is 
important to describe how the units/subjects were selected and let the 
reader determine how applicable the results are to other populations.

When data is not collected by a random sample, the t-test can still be 
justified either as an approximation to the permutation test (but, as Doug 
would say, why approximate when you can use the computer to do the real thing) 
or if a normal model for the data is ASSUMED and not concluded in reference to 
the central limit theorem and random sampling that did not occur.

I would be very interested if readers of this message can send me specific 
reference to the use of a t-test with real data in an introductory text 
book for which the individual objects were genuinely sampled at random 
from populations.

-Bret