[R] Question

Tue Dec 14 18:12:14 CET 2010

... (in addition to the very useful suggestion to plot your data):

(Sounds like a homework question... ?).

Sigh..... [mount soapbox]

1. "Data" never deviate from normality. They only provide provide
evidence to challenge ("test" is the formal term) the assumption that
the population from which the data were sampled (how? -- see below)
can be modeled as normal (e.g. whether the data provide strong
evidence against this assumption). This is a philosophical brain
twister, I know; but understanding what it means is actually very
important for how one uses evidence (data) to inform science. It took
me about 20 years after grad school to (partially, anyway) figure it
out. Bear of little brain and all that..

2. Define: "Deviate from normality." With a sample of 1000, normality
tests at conventional significance levels will typically come out
statistically significant/contradict normality (which is why a whole
school of statistics, the gang of Bayesians, do not think that
"statistical significance" and "evidence in the data" have much to do
with one another). But that's not the real question, is it?

3. The real question is: Does whatever I do to analyze the data and
draw scientific conclusions depend crucially on the assumption of
normality of the underlying population from which the data are
sampled? Of course, it depends on exactly what you do, but, by and
large,  basic statistical texts continue to teach that the answer is
yes. Unfortunately, that is mostly (not always -- and it depends on
what's at issue) a lie, as we have known for about 50 years. The
crucial matter in practice is not normality but how the sampled data
were obtained: the study design and, especially, the issue of
"independence." Unfortunately, that is rather complicated to deal
with, so the Intro Stats texts prefer to ignore it and teach hogwash.

[dismount soapbox]

Thoughtful nasty rejoinders welcome. Please send your thought-less
nasty ones to me privately to spare our colleagues.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics