[R] Unbalanced Anova: What is the best approach?

Sun Apr 3 17:35:01 CEST 2011

Dear Krishna,

> -----Original Message-----
> From: Krishna Kirti Das [mailto:krishnakirti at gmail.com]
> Sent: April-03-11 10:36 AM
> To: John Fox
> Cc: r-help at r-project.org
> Subject: Re: [R] Unbalanced Anova: What is the best approach?
> 
> Thank you, John.
> 
> Yes, your answers do help. For me it's mainly about getting familiar with
> the "R" way of doing things.
> 
> Thus your response also confirms what I suspected, that there is no
> explicit user-interface (at least one that is widely used) in terms of
> functions/packages that represents an unbalanced design in the same way
> that aov would represent a balanced one. Analyzing balanced and unbalanced
> data are obviously possible, but with balanced designs via aov what has to
> be done is intuitive within the language but unintuitive for unbalanced
> designs.

I don't agree with your characterization. For example, the representation of
a two-way crossed ANOVA model as an R model formula is precisely the same
for balanced and unbalanced data: for response Y and factors A and B, Y ~
A*B. Moreover, the issue of how to formulate tests is independent of the
software you use.

> 
> I did notice that this question gets asked several times and in slightly
> different ways, and I think the lack of an interface that represents an
> unbalanced design in the same way aov represents balanced designs is why
> the question will probably keep getting asked again.

I suspect that the issue gets asked repeatedly for two reasons: (1) More
fundamentally, I believe that the general level of understanding of
hypothesis tests in unbalanced data is low; (2) people don't necessarily
read previous posts to r-help.

> 
> I had mentioned nlme and lme4 because I saw in some of the discussions
> that using those were recommended for working with unbalanced designs. And
> specifying random effects with zero variance, for example, would probably
> serve my purposes.

I don't think that either lme() or lmer() will allow you to fit a model
without random effects, but even if they did there wouldn't be much sense in
doing so. You can compute a mean with lm() or glm(), but would you?

Best,
 John

> 
> Thank you for your help.
> 
> Sincerely,
> 
> Krishna
> 
> On Sun, Apr 3, 2011 at 7:28 AM, John Fox <jfox at mcmaster.ca> wrote:
> 
> 
> 	Dear Krishna,
> 
> 	Although it's difficult to explain briefly, I'd argue that balanced
> and
> 	unbalanced ANOVA are not fundamentally different, in that the focus
> should
> 	be on the hypotheses that are tested, and these are naturally
> expressed as
> 	functions of cell means and marginal means. For example, in a
two-way
> ANOVA,
> 	the null hypotheses of no interaction is equivalent to parallel
> profiles of
> 	cell means for one factor across levels of the other. What is
> different,
> 	though, is that in a balanced ANOVA all common approaches to
> constructing an
> 	ANOVA table coincide.
> 
> 	Without getting into the explanation in detail (which you can find
in
> a text
> 	like my Applied Regression Analysis and Generalized Linear Models),
> 	so-called type-I (or sequential) tests, such as those performed by
> the
> 	standard anova() function in R, test hypotheses that are rarely of
> 	substantive interest, and, even when they are, are of interest only
> by
> 	accident. So-called type-II tests, such as those performed by
default
> by the
> 	Anova() function in the car package, test hypotheses that are almost
> always
> 	of interest. Type-III tests, which the Anova() function in car can
> perform
> 	optionally, require careful formulation of the model for the
> hypotheses
> 	tested to be sensible, and even then have less power than
> corresponding
> 	type-II tests in the circumstances in which a test would be of
> interest.
> 
> 	Since you're addressing fixed-effects models, I'm not sure why you
> 	introduced nlme and lme4 into the discussion, but I note that
Anova()
> in the
> 	car package has methods that can produce type-II and -III Wald tests
> for the
> 	fixed effects in mixed models fit by lme() and lmer().
> 
> 	Your question has been asked several times before on the r-help
list.
> For
> 	example, if you enter terms like "type-II" or "unbalanced ANOVA" in
> the
> 	RSeek search engine and look under the "Support Lists" tab, you'll
> see many
> 	hits -- e.g.,
> 	<Mhttps://stat.ethz.ch/pipermail/r-help/2006-August/111927.html>.
> 
> 	I hope this helps,
> 	 John
> 
> 	--------------------------------
> 	John Fox
> 	Senator William McMaster
> 	 Professor of Social Statistics
> 	Department of Sociology
> 	McMaster University
> 	Hamilton, Ontario, Canada
> 	http://socserv.mcmaster.ca/jfox
> 
> 
> 
> 
> 	> -----Original Message-----
> 	> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org]
> 	> On Behalf Of Krishna Kirti Das
> 	> Sent: April-03-11 3:25 AM
> 	> To: r-help at r-project.org
> 	> Subject: [R] Unbalanced Anova: What is the best approach?
> 	>
> 	> I have a three-way unbalanced ANOVA that I need to calculate
(fixed
> 	> effects plus interactions, no random effects). But word has it
that
> aov()
> 	> is good only for balanced designs. I have seen a number of
> different
> 	> recommendations for working with unbalanced designs, but they seem
> to
> 	> differ widely (car, nlme, lme4, etc.). So I would like to know
what
> is the
> 	> best or most usual way to go about working with unbalanced designs
> and
> 	> extracting a reliable ANOVA table from them in R?
> 	>
> 
> 	>       [[alternative HTML version deleted]]
> 	>
> 	> ______________________________________________
> 	> R-help at r-project.org mailing list
> 	> https://stat.ethz.ch/mailman/listinfo/r-help
> 	> PLEASE do read the posting guide http://www.R-project.org/posting-
> 	> guide.html
> 	> and provide commented, minimal, self-contained, reproducible code.
> 
> 
>