[R] Unbalanced Anova: What is the best approach?

Sun Apr 3 17:06:46 CEST 2011

Hi, Krishna:

<in line>

On 4/3/2011 7:35 AM, Krishna Kirti Das wrote:
> Thank you, John.
>
> Yes, your answers do help. For me it's mainly about getting familiar with
> the "R" way of doing things.
>
> Thus your response also confirms what I suspected, that there is no explicit
> user-interface (at least one that is widely used) in terms of
> functions/packages that represents an unbalanced design in the same way that
> aov would represent a balanced one. Analyzing balanced and unbalanced data
> are obviously possible, but with balanced designs via aov what has to be
> done is intuitive within the language but unintuitive for unbalanced
> designs.

       Intuition is subject to one's background and expectations.  If 
you think in terms of a series of nested hypotheses, then the standard R 
anova is very intuitive.  I never use aov, because it's not intuitive to 
me and not very general.  'aov' is only useful for a balanced design 
with normal independent errors with constant variance.  The real world 
is rarely so simple.  The 'aov' algorithm was wonderful over half a 
century ago, when all computations were done by hand or using a 
mechanical calculator (e.g., an abacus or a calculator with gears).  
Unbalanced designs were largely impractical because of computational 
difficulties.  There were many procedures for imputing missing values 
for a design that was "almost balanced".

       I encourage you to think in terms of alternative sequences of 
nested hypotheses, including the implications of A being significant by 
itself, but not with B already present, except that the A:B interaction 
is or is not significant.

> I did notice that this question gets asked several times and in slightly
> different ways, and I think the lack of an interface that represents an
> unbalanced design in the same way aov represents balanced designs is why the
> question will probably keep getting asked again.
>
> I had mentioned nlme and lme4 because I saw in some of the discussions that
> using those were recommended for working with unbalanced designs. And
> specifying random effects with zero variance, for example, would probably
> serve my purposes.

       I'd be surprised if nlme or lme4 changes what I wrote above.

       Hope this helps.
       Spencer

> Thank you for your help.
>
> Sincerely,
>
> Krishna
>
> On Sun, Apr 3, 2011 at 7:28 AM, John Fox<jfox at mcmaster.ca>  wrote:
>
>> Dear Krishna,
>>
>> Although it's difficult to explain briefly, I'd argue that balanced and
>> unbalanced ANOVA are not fundamentally different, in that the focus should
>> be on the hypotheses that are tested, and these are naturally expressed as
>> functions of cell means and marginal means. For example, in a two-way
>> ANOVA,
>> the null hypotheses of no interaction is equivalent to parallel profiles of
>> cell means for one factor across levels of the other. What is different,
>> though, is that in a balanced ANOVA all common approaches to constructing
>> an
>> ANOVA table coincide.
>>
>> Without getting into the explanation in detail (which you can find in a
>> text
>> like my Applied Regression Analysis and Generalized Linear Models),
>> so-called type-I (or sequential) tests, such as those performed by the
>> standard anova() function in R, test hypotheses that are rarely of
>> substantive interest, and, even when they are, are of interest only by
>> accident. So-called type-II tests, such as those performed by default by
>> the
>> Anova() function in the car package, test hypotheses that are almost always
>> of interest. Type-III tests, which the Anova() function in car can perform
>> optionally, require careful formulation of the model for the hypotheses
>> tested to be sensible, and even then have less power than corresponding
>> type-II tests in the circumstances in which a test would be of interest.
>>
>> Since you're addressing fixed-effects models, I'm not sure why you
>> introduced nlme and lme4 into the discussion, but I note that Anova() in
>> the
>> car package has methods that can produce type-II and -III Wald tests for
>> the
>> fixed effects in mixed models fit by lme() and lmer().
>>
>> Your question has been asked several times before on the r-help list. For
>> example, if you enter terms like "type-II" or "unbalanced ANOVA" in the
>> RSeek search engine and look under the "Support Lists" tab, you'll see many
>> hits -- e.g.,
>> <Mhttps://stat.ethz.ch/pipermail/r-help/2006-August/111927.html>.
>>
>> I hope this helps,
>>   John
>>
>> --------------------------------
>> John Fox
>> Senator William McMaster
>>   Professor of Social Statistics
>> Department of Sociology
>> McMaster University
>> Hamilton, Ontario, Canada
>> http://socserv.mcmaster.ca/jfox
>>
>>
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
>>> On Behalf Of Krishna Kirti Das
>>> Sent: April-03-11 3:25 AM
>>> To: r-help at r-project.org
>>> Subject: [R] Unbalanced Anova: What is the best approach?
>>>
>>> I have a three-way unbalanced ANOVA that I need to calculate (fixed
>>> effects plus interactions, no random effects). But word has it that aov()
>>> is good only for balanced designs. I have seen a number of different
>>> recommendations for working with unbalanced designs, but they seem to
>>> differ widely (car, nlme, lme4, etc.). So I would like to know what is
>> the
>>> best or most usual way to go about working with unbalanced designs and
>>> extracting a reliable ANOVA table from them in R?
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567