[R] Problems with normality req. for ANOVA
Liaw, Andy
andy_liaw at merck.com
Tue Aug 3 14:41:50 CEST 2010
As a matter of fact, I would say both Bert and I encounter "designed
experiments" a lot more than "observational studies", yet we speak from
experience that those things that Bert mentioned happen on a daily
basis. When you talk to experimenters, ask your questions carefully and
you'll see these things crop up.
Andy
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of David Winsemius
Sent: Monday, August 02, 2010 3:35 PM
To: Bert Gunter
Cc: r-help at r-project.org; wwreith
Subject: Re: [R] Problems with normality req. for ANOVA
In a general situation of observational studies, your point is
undoubtedly true, and apparently you believe it to be true even in the
setting of designed experiments. Perhaps I should have confined myself
to my first sentence.
--
David.
On Aug 2, 2010, at 2:05 PM, Bert Gunter wrote:
> David et. al:
>
> I take issue with this. It is the lack of independence that is the
> major issue. In particular, clustering, split-plotting, and so forth
> due to "convenience order" experimentation, lack of randomization,
> exogenous effects like the systematic effects due to measurement
> method/location have the major effect on inducing bias and
> distorting inference. Normality and unequal variances typically pale
> to insignificance compared to this.
>
> Obviously, IMHO.
>
> Note 1: George Box noted this at least 50 years ago in the early
> '60's when he and Jenkins developed arima modeling.
>
> Note 2: If you can, have a look at Jack Youden's classic paper
> "Enduring Values", which comments to some extent on these issues,
> here: http://www.jstor.org/pss/1266913
>
> Cheers,
> Bert
>
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
>
>
> On Mon, Aug 2, 2010 at 10:32 AM, David Winsemius
<dwinsemius at comcast.net
> > wrote:
>
> On Aug 2, 2010, at 9:33 AM, wwreith wrote:
>
>
> I am conducting an experiment with four independent variables each
> of which
> has three or more factor levels. The sample size is quite large i.e.
> several
> thousand. The dependent variable data does not pass a normality test
> but
> "visually" looks close to normal so is there a way to compute the
> affect
> this would have on the p-value for ANOVA or is there a way to
> perform an
> nonparametric test in R that will handle this many independent
> variables.
> Simply saying ANOVA is robust to small departures from normality is
> not
> going to be good enough for my client.
>
> The statistical assumption of normality for linear models do not
> apply to the distribution of the dependent variable, but rather to
> the residuals after a model is estimated. Furthermore, it is the
> homoskedasticity assumption that is more commonly violated and also
> greater threat to validity. (And if you don't already know both of
> these points, then you desperately need to review your basic
> modeling practices.)
>
>
> I need to compute an error amount for
> ANOVA or find a nonparametric equivalent.
>
> You might get a better answer if you expressed the first part of
> that question in unambiguous terminology. What is "error amount"?
>
> For the second part, there is an entire Task View on Robust
> Statistical Methods.
>
> --
>
> David Winsemius, MD
> West Hartford, CT
>
>
>
>
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice: This e-mail message, together with any attachme...{{dropped:11}}
More information about the R-help
mailing list