[R-sig-teaching] prop.test in R

Tue Oct 26 23:07:37 CEST 2010

> Exactly - elementary texts and methods books recommend the welch test
> for the reason you mention.  Curiously, those same texts recommend
> using anova and regression without automatically correcting for the
> possibility of non-constant variance.  Why is the case of comparing
> two means different from 3?  Those same books will tell you that anova
> is pretty robust to non-constant variance.  well, the two sample
> t-test is anova.

I agree with you that the presentation is unfortunate. Perhaps it has
something to do with the fact that heteroskedastic consistent covariance
matrices (HCCM) for linear regression are a relatively recent development
(by White and Huber in the early 80s), and initially they performed poorly
for small sample sizes. From a pedegogy standpoint the derivations of the
formulas for HCCM are beyond the scope of an undergraduate course whereas
the equal variance versions can be easily derived.

Given more recent simulation studies showing the power and level of tests
based on HCCM are comparable with equal variance regression, and that
there is rarely any reason to apriori think that the variances are equal.

The anova is robust to violations so long as the group sizes are equal. if
they aren't then it isn't.

>
> I don't use the welch test except as a conscious decision: ie I really
> want to compare the means while suspecting that the variances differ.
> Generally people are using the t test to certify that two populations
> are different.  If the variances are wildly different, that may be
> much more important than a difference in means.  in fact, to test for
> a difference in means when the variances are wildly different is
> almost always substantively silly.   There was a great example a few
> years ago from a psychiatric journal, comparing two medications, where
> the investigators did a t-test for the means when one distribution was
> unimodal and the other was bi-modal; there was no statistically
> significant difference in the means, but there was a really important
> difference in the distributions.  The automatic use of the welch test
> makes you feel that you are protected against Bad Things, when you
> aren't.

You may not suspect that the variances are different, but there is no
apriori reason to think that they are equal. Why should you assume
something you have no reason to believe is true? In my experience, people
are not using the t-test to say that two populations are in some general
way different, but rather specifically that the means vary. This is an
important question regardless of whether the variances are equal.

In your medication example, the shape of the two distributions was
different, but when making the decision of whether to approve a
medication, the more important question is whether the central tendency is
different. Does one medication on average improve the outcome more than
another. A secondary, though important, question is how variable the
outcome is. The investigators made a correct inference (in stating no
significant mean difference between the groups), but they missed an
important question that they could have asked their data. This omission
has nothing to do with the t-test.

Using heteroskedastic robust methods DO protect against "Bad Things." What
they don't do is reveal the existence  important data trends unrelated to
their hypothesis of interest.

Ian