[R] Problems with normality req. for ANOVA

David Winsemius dwinsemius at comcast.net
Mon Aug 2 21:34:46 CEST 2010


In a general situation of observational studies, your point is  
undoubtedly true, and apparently you believe it to be true even in the  
setting of designed experiments. Perhaps I should have confined myself  
to my first sentence.

-- 
David.


On Aug 2, 2010, at 2:05 PM, Bert Gunter wrote:

> David et. al:
>
> I take issue with this. It is the lack of independence that is the  
> major issue. In particular, clustering, split-plotting, and so forth  
> due to "convenience order" experimentation, lack of randomization,  
> exogenous effects like the systematic effects due to measurement  
> method/location have the major effect on inducing bias and  
> distorting inference. Normality and unequal variances typically pale  
> to insignificance compared to this.
>
> Obviously, IMHO.
>
> Note 1: George Box noted this at least 50 years ago in the early  
> '60's when he and Jenkins developed arima modeling.
>
> Note 2: If you can, have a look at Jack Youden's classic paper  
> "Enduring Values", which comments to some extent on these issues,  
> here: http://www.jstor.org/pss/1266913
>
> Cheers,
> Bert
>
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
>
>
> On Mon, Aug 2, 2010 at 10:32 AM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
>
> On Aug 2, 2010, at 9:33 AM, wwreith wrote:
>
>
> I am conducting an experiment with four independent variables each  
> of which
> has three or more factor levels. The sample size is quite large i.e.  
> several
> thousand. The dependent variable data does not pass a normality test  
> but
> "visually" looks close to normal so is there a way to compute the  
> affect
> this would have on the p-value for ANOVA or is there a way to  
> perform an
> nonparametric test in R that will handle this many independent  
> variables.
> Simply saying ANOVA is robust to small departures from normality is  
> not
> going to be good enough for my client.
>
> The statistical assumption of normality for linear models do not  
> apply to the distribution of the dependent variable, but rather to  
> the residuals after a model is estimated. Furthermore, it is the  
> homoskedasticity assumption that is more commonly violated and also  
> greater threat to validity. (And if you don't already know both of  
> these points, then you desperately need to review your basic  
> modeling practices.)
>
>
>  I need to compute an error amount for
> ANOVA or find a nonparametric equivalent.
>
> You might get a better answer if you expressed the first part of  
> that question in unambiguous terminology.  What is "error amount"?
>
> For the second part, there is an entire Task View on Robust  
> Statistical Methods.
>
> -- 
>
> David Winsemius, MD
> West Hartford, CT
>
>
>
>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list