[R] Problems with normality req. for ANOVA
David Winsemius
dwinsemius at comcast.net
Mon Aug 2 21:34:46 CEST 2010
In a general situation of observational studies, your point is
undoubtedly true, and apparently you believe it to be true even in the
setting of designed experiments. Perhaps I should have confined myself
to my first sentence.
--
David.
On Aug 2, 2010, at 2:05 PM, Bert Gunter wrote:
> David et. al:
>
> I take issue with this. It is the lack of independence that is the
> major issue. In particular, clustering, split-plotting, and so forth
> due to "convenience order" experimentation, lack of randomization,
> exogenous effects like the systematic effects due to measurement
> method/location have the major effect on inducing bias and
> distorting inference. Normality and unequal variances typically pale
> to insignificance compared to this.
>
> Obviously, IMHO.
>
> Note 1: George Box noted this at least 50 years ago in the early
> '60's when he and Jenkins developed arima modeling.
>
> Note 2: If you can, have a look at Jack Youden's classic paper
> "Enduring Values", which comments to some extent on these issues,
> here: http://www.jstor.org/pss/1266913
>
> Cheers,
> Bert
>
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
>
>
> On Mon, Aug 2, 2010 at 10:32 AM, David Winsemius <dwinsemius at comcast.net
> > wrote:
>
> On Aug 2, 2010, at 9:33 AM, wwreith wrote:
>
>
> I am conducting an experiment with four independent variables each
> of which
> has three or more factor levels. The sample size is quite large i.e.
> several
> thousand. The dependent variable data does not pass a normality test
> but
> "visually" looks close to normal so is there a way to compute the
> affect
> this would have on the p-value for ANOVA or is there a way to
> perform an
> nonparametric test in R that will handle this many independent
> variables.
> Simply saying ANOVA is robust to small departures from normality is
> not
> going to be good enough for my client.
>
> The statistical assumption of normality for linear models do not
> apply to the distribution of the dependent variable, but rather to
> the residuals after a model is estimated. Furthermore, it is the
> homoskedasticity assumption that is more commonly violated and also
> greater threat to validity. (And if you don't already know both of
> these points, then you desperately need to review your basic
> modeling practices.)
>
>
> I need to compute an error amount for
> ANOVA or find a nonparametric equivalent.
>
> You might get a better answer if you expressed the first part of
> that question in unambiguous terminology. What is "error amount"?
>
> For the second part, there is an entire Task View on Robust
> Statistical Methods.
>
> --
>
> David Winsemius, MD
> West Hartford, CT
>
>
>
>
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list