[R] Unbalanced Anova: What is the best approach?

Sun Apr 3 17:24:23 CEST 2011

Dear Spencer,

> -----Original Message-----
> From: Spencer Graves [mailto:spencer.graves at prodsyse.com]
> Sent: April-03-11 11:07 AM
> To: Krishna Kirti Das
> Cc: John Fox; r-help at r-project.org
> Subject: Re: [R] Unbalanced Anova: What is the best approach?
> 
> Hi, Krishna:
> 
> 
> <in line>
> 
> On 4/3/2011 7:35 AM, Krishna Kirti Das wrote:
> > Thank you, John.
> >
> > Yes, your answers do help. For me it's mainly about getting familiar
> > with the "R" way of doing things.
> >
> > Thus your response also confirms what I suspected, that there is no
> > explicit user-interface (at least one that is widely used) in terms of
> > functions/packages that represents an unbalanced design in the same
> > way that aov would represent a balanced one. Analyzing balanced and
> > unbalanced data are obviously possible, but with balanced designs via
> > aov what has to be done is intuitive within the language but
> > unintuitive for unbalanced designs.
> 
>        Intuition is subject to one's background and expectations.  If you
> think in terms of a series of nested hypotheses, then the standard R anova
> is very intuitive.  I never use aov, because it's not intuitive to me and
> not very general.  'aov' is only useful for a balanced design with normal
> independent errors with constant variance.  The real world is rarely so
> simple.  The 'aov' algorithm was wonderful over half a century ago, when
> all computations were done by hand or using a mechanical calculator (e.g.,
> an abacus or a calculator with gears).
> Unbalanced designs were largely impractical because of computational
> difficulties.  There were many procedures for imputing missing values for
> a design that was "almost balanced".
> 
> 
>        I encourage you to think in terms of alternative sequences of
> nested hypotheses, including the implications of A being significant by
> itself, but not with B already present, except that the A:B interaction is
> or is not significant.

So-called type-II tests do exactly that -- that is, obey the principle of
marginality; they are maximally powerful if the higher-order term(s) to
which a particular term is marginal are 0.

Best,
 John

> 
> > I did notice that this question gets asked several times and in
> > slightly different ways, and I think the lack of an interface that
> > represents an unbalanced design in the same way aov represents
> > balanced designs is why the question will probably keep getting asked
> again.
> >
> > I had mentioned nlme and lme4 because I saw in some of the discussions
> > that using those were recommended for working with unbalanced designs.
> > And specifying random effects with zero variance, for example, would
> > probably serve my purposes.
> 
>        I'd be surprised if nlme or lme4 changes what I wrote above.
> 
> 
>        Hope this helps.
>        Spencer
> 
> > Thank you for your help.
> >
> > Sincerely,
> >
> > Krishna
> >
> > On Sun, Apr 3, 2011 at 7:28 AM, John Fox<jfox at mcmaster.ca>  wrote:
> >
> >> Dear Krishna,
> >>
> >> Although it's difficult to explain briefly, I'd argue that balanced
> >> and unbalanced ANOVA are not fundamentally different, in that the
> >> focus should be on the hypotheses that are tested, and these are
> >> naturally expressed as functions of cell means and marginal means.
> >> For example, in a two-way ANOVA, the null hypotheses of no
> >> interaction is equivalent to parallel profiles of cell means for one
> >> factor across levels of the other. What is different, though, is that
> >> in a balanced ANOVA all common approaches to constructing an ANOVA
> >> table coincide.
> >>
> >> Without getting into the explanation in detail (which you can find in
> >> a text like my Applied Regression Analysis and Generalized Linear
> >> Models), so-called type-I (or sequential) tests, such as those
> >> performed by the standard anova() function in R, test hypotheses that
> >> are rarely of substantive interest, and, even when they are, are of
> >> interest only by accident. So-called type-II tests, such as those
> >> performed by default by the
> >> Anova() function in the car package, test hypotheses that are almost
> >> always of interest. Type-III tests, which the Anova() function in car
> >> can perform optionally, require careful formulation of the model for
> >> the hypotheses tested to be sensible, and even then have less power
> >> than corresponding type-II tests in the circumstances in which a test
> would be of interest.
> >>
> >> Since you're addressing fixed-effects models, I'm not sure why you
> >> introduced nlme and lme4 into the discussion, but I note that Anova()
> >> in the car package has methods that can produce type-II and -III Wald
> >> tests for the fixed effects in mixed models fit by lme() and lmer().
> >>
> >> Your question has been asked several times before on the r-help list.
> >> For example, if you enter terms like "type-II" or "unbalanced ANOVA"
> >> in the RSeek search engine and look under the "Support Lists" tab,
> >> you'll see many hits -- e.g.,
> >> <Mhttps://stat.ethz.ch/pipermail/r-help/2006-August/111927.html>.
> >>
> >> I hope this helps,
> >>   John
> >>
> >> --------------------------------
> >> John Fox
> >> Senator William McMaster
> >>   Professor of Social Statistics
> >> Department of Sociology
> >> McMaster University
> >> Hamilton, Ontario, Canada
> >> http://socserv.mcmaster.ca/jfox
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: r-help-bounces at r-project.org
> >>> [mailto:r-help-bounces at r-project.org]
> >>> On Behalf Of Krishna Kirti Das
> >>> Sent: April-03-11 3:25 AM
> >>> To: r-help at r-project.org
> >>> Subject: [R] Unbalanced Anova: What is the best approach?
> >>>
> >>> I have a three-way unbalanced ANOVA that I need to calculate (fixed
> >>> effects plus interactions, no random effects). But word has it that
> >>> aov() is good only for balanced designs. I have seen a number of
> >>> different recommendations for working with unbalanced designs, but
> >>> they seem to differ widely (car, nlme, lme4, etc.). So I would like
> >>> to know what is
> >> the
> >>> best or most usual way to go about working with unbalanced designs
> >>> and extracting a reliable ANOVA table from them in R?
> >>>
> >>>        [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide http://www.R-project.org/posting-
> >>> guide.html and provide commented, minimal, self-contained,
> >>> reproducible code.
> >>
> > 	[[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 
> --
> Spencer Graves, PE, PhD
> President and Chief Operating Officer
> Structure Inspection and Monitoring, Inc.
> 751 Emerson Ct.
> San José, CA 95126
> ph:  408-655-4567