[R] Problems with normality req. for ANOVA

Tue Aug 3 17:20:11 CEST 2010

On Aug 3, 2010, at 2:41 PM, Liaw, Andy wrote:

> As a matter of fact, I would say both Bert and I encounter "designed
> experiments" a lot more than "observational studies", yet we speak from
> experience that those things that Bert mentioned happen on a daily
> basis.  When you talk to experimenters, ask your questions carefully and
> you'll see these things crop up.

Yes. I think the most egregious example I have seen involved getting an F test wrong by a factor of 7. This sort of error comes about extremely easily if you divide by the wrong sum of squares in an ANOVA table, and since it often requires dealing with difficult terms like "random interaction", researchers are typically much more prone to collect data in complicated designs than they are to analyze them correctly afterwards. 

However, it obviously depends on your perspective,  epidemiologists usually have different complications from econometricians, and a clinical trial is typically less complicated than a lab experiment. 

> 
> Andy
> 
> 
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of David Winsemius
> Sent: Monday, August 02, 2010 3:35 PM
> To: Bert Gunter
> Cc: r-help at r-project.org; wwreith
> Subject: Re: [R] Problems with normality req. for ANOVA
> 
> In a general situation of observational studies, your point is  
> undoubtedly true, and apparently you believe it to be true even in the  
> setting of designed experiments. Perhaps I should have confined myself  
> to my first sentence.
> 
> -- 
> David.
> 
> 
> On Aug 2, 2010, at 2:05 PM, Bert Gunter wrote:
> 
>> David et. al:
>> 
>> I take issue with this. It is the lack of independence that is the  
>> major issue. In particular, clustering, split-plotting, and so forth  
>> due to "convenience order" experimentation, lack of randomization,  
>> exogenous effects like the systematic effects due to measurement  
>> method/location have the major effect on inducing bias and  
>> distorting inference. Normality and unequal variances typically pale  
>> to insignificance compared to this.
>> 
>> Obviously, IMHO.
>> 
>> Note 1: George Box noted this at least 50 years ago in the early  
>> '60's when he and Jenkins developed arima modeling.
>> 
>> Note 2: If you can, have a look at Jack Youden's classic paper  
>> "Enduring Values", which comments to some extent on these issues,  
>> here: http://www.jstor.org/pss/1266913
>> 
>> Cheers,
>> Bert
>> 
>> 
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>> 
>> 
>> 
>> On Mon, Aug 2, 2010 at 10:32 AM, David Winsemius
> <dwinsemius at comcast.net 
>>> wrote:
>> 
>> On Aug 2, 2010, at 9:33 AM, wwreith wrote:
>> 
>> 
>> I am conducting an experiment with four independent variables each  
>> of which
>> has three or more factor levels. The sample size is quite large i.e.  
>> several
>> thousand. The dependent variable data does not pass a normality test  
>> but
>> "visually" looks close to normal so is there a way to compute the  
>> affect
>> this would have on the p-value for ANOVA or is there a way to  
>> perform an
>> nonparametric test in R that will handle this many independent  
>> variables.
>> Simply saying ANOVA is robust to small departures from normality is  
>> not
>> going to be good enough for my client.
>> 
>> The statistical assumption of normality for linear models do not  
>> apply to the distribution of the dependent variable, but rather to  
>> the residuals after a model is estimated. Furthermore, it is the  
>> homoskedasticity assumption that is more commonly violated and also  
>> greater threat to validity. (And if you don't already know both of  
>> these points, then you desperately need to review your basic  
>> modeling practices.)
>> 
>> 
>> I need to compute an error amount for
>> ANOVA or find a nonparametric equivalent.
>> 
>> You might get a better answer if you expressed the first part of  
>> that question in unambiguous terminology.  What is "error amount"?
>> 
>> For the second part, there is an entire Task View on Robust  
>> Statistical Methods.
>> 
>> -- 
>> 
>> David Winsemius, MD
>> West Hartford, CT
>> 
>> 
>> 
>> 
> 
> David Winsemius, MD
> West Hartford, CT
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> Notice:  This e-mail message, together with any attachme...{{dropped:11}}
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com