[R-sig-teaching] prop.test in R

Tue Oct 26 14:40:51 CEST 2010

On Tue, Oct 26, 2010 at 4:42 AM, Adams, Zeno <Zeno.Adams at ebs.edu> wrote:
> This is an interesting discussion. Concerning the right choice and computation of t-tests there is still one point that is unclear to me: In the Welch t-test we have a difference in the numerator and a standard deviation of a difference in the denominator. Why then is the standard deviation of the difference not computed correctly, i.e. why is the covariance between X and Y not taken into account?
>
> For example using the sleep data:
>
> data(sleep)
> means <- tapply(sleep$extra,sleep$group,mean) ; means
> vars <- tapply(sleep$extra,sleep$group,var) ; vars
>
> sd.welch <- sqrt(vars[1]/10 + vars[2]/10) ; sd.welch
> #in sd.welch the covariance is ignored
>
> t.welch <- (means[1]-means[2])/sd.welch ; t.welch
>
> #verify with R-built-in t.test function:
> t.test(extra ~ group, data = sleep)
>
>
> However, the correlation between sleep$extra[sleep$group == 1] and sleep$extra[sleep$group == 2] is relatively high:
>
> cor(sleep$extra[sleep$group == 1],sleep$extra[sleep$group == 2])
>
> Souldn’t the correct standard deviation be…
>
> sd.paired <- sqrt(vars[1]/10 + vars[2]/10
>        -2*cov(sleep$extra[sleep$group == 1],sleep$extra[sleep$group == 2])/10) ; sd.paired
>
> …as in the paired t-test???

Only if the order of the observations in each sample is fixed.  I
don't want to sound facetious but the important characteristic of the
samples in a paired t-test is that they are paired.  The first
observation in sample 1 is associated in some way with the first
observation in sample 2, say because they are observations on the same
subject or at the same location or ...

If there is no pairing then one of the samples could be rearranged
without changing the other, thereby changing the covariance.

Because of the pairing the sample sizes in a paired t-test must be
equal.  But a t-test for independent samples can be used when the
sample sizes are unequal.  So, no, the t-test for independent samples
is not a special case of the paired t-test.

> In other words, isn’t the Welch t-test a special case of the paired t-test with both samples assumed to be uncorrelated? And shouldn’t we then teach only the paired t-test as the most general test in class?
>
> Thanks!
>
> Zeno
>
>
>
> -----Original Message-----
> From: r-sig-teaching-bounces at r-project.org on behalf of Albyn Jones
> Sent: Tue 10/26/2010 3:51 AM
> To: Ian Fellows
> Cc: r-sig-teaching at r-project.org
> Subject: Re: [R-sig-teaching] prop.test in R
>
> Exactly - elementary texts and methods books recommend the welch test
> for the reason you mention.  Curiously, those same texts recommend
> using anova and regression without automatically correcting for the
> possibility of non-constant variance.  Why is the case of comparing
> two means different from 3?  Those same books will tell you that anova
> is pretty robust to non-constant variance.  well, the two sample
> t-test is anova.
>
> I don't use the welch test except as a conscious decision: ie I really
> want to compare the means while suspecting that the variances differ.
> Generally people are using the t test to certify that two populations
> are different.  If the variances are wildly different, that may be
> much more important than a difference in means.  in fact, to test for
> a difference in means when the variances are wildly different is
> almost always substantively silly.   There was a great example a few
> years ago from a psychiatric journal, comparing two medications, where
> the investigators did a t-test for the means when one distribution was
> unimodal and the other was bi-modal; there was no statistically
> significant difference in the means, but there was a really important
> difference in the distributions.  The automatic use of the welch test
> makes you feel that you are protected against Bad Things, when you
> aren't.
>
> albyn
>
> Quoting Ian Fellows <ian.fellows at stat.ucla.edu>:
>
>> In the case of the t.test, having the default be var.equal=TRUE is
>> the right way to go. There is little to no power lost by using the
>> welch test, and the assumption of equal variance can be difficult to
>> assess. For this reason, many introductory text books have now
>> banished the equal variance t-test from their chapters (e.g. Moore's
>> The Basic Practice of Statistics).
>>
>> Ian
>>
>>
>> On Oct 25, 2010, at 4:05 PM, Albyn Jones wrote:
>>
>>> I don't know, the help file is uninformative.  I'd guess the answer is
>>> "the author wrote it that way".  Other R functions like t.test include
>>> similar unfortunate (to me) default choices, in that case
>>> var.equal=FALSE (ie the Welch test) is the default.
>>>
>>> albyn
>>>
>>> On Mon, Oct 25, 2010 at 04:15:20PM -0500, Laura Chihara wrote:
>>>> Yes, thank you for this reference. But according to
>>>> this article, the score is better than continuity
>>>> correction, so why is continuity correction the default
>>>> with prop.test?
>>>>
>>>> -Laura
>>>>
>>>> On 10/25/2010 4:02 PM, Ralph O'Brien, PhD wrote:
>>>>> I suggest:
>>>>>
>>>>> A. Agresti and B. A. Coull. Approximate is better than "exact" for
>>>>> interval estimation of binomial proportions. The American Statistician,
>>>>> 52(2):119-126, 1998.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Oct 25, 2010 at 4:38 PM, Laura Chihara <lchihara at carleton.edu
>>>>> <mailto:lchihara at carleton.edu>> wrote:
>>>>>
>>>>>   Hi,
>>>>>
>>>>>   I have a question about prop.test in R:
>>>>>
>>>>>   I teach students the score confidence
>>>>>   interval for proportions (also called
>>>>>   Wilson or Wilson score interval).
>>>>>
>>>>>   prop.test(,..., correct=FALSE) gives this
>>>>>   interval.
>>>>>
>>>>>   The default uses a continuity correction.
>>>>>   When should we use one over the other?
>>>>>   Is it worth going over this in class? Why
>>>>>   is correct=TRUE the default?
>>>>>
>>>>>   Thanks for any pedagogical guidance here!
>>>>>
>>>>>   -- Laura
>>>>>
>>>>>   *******************************************
>>>>>   Laura Chihara
>>>>>   Professor of Mathematics   507-222-4065 (office)
>>>>>   Dept of Mathematics        507-222-4312 (fax)
>>>>>   Carleton College
>>>>>   1 North College Street
>>>>>   Northfield MN 55057
>>>>>
>>>>>   _______________________________________________
>>>>>   R-sig-teaching at r-project.org <mailto:R-sig-teaching at r-project.org>
>>>>>   mailing list
>>>>>   https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ralph O'Brien, PhD
>>>>> Professor, Dept of Epidemiology and Biostatistics
>>>>> Case Western Reserve University
>>>>> Office: 216.368.1927
>>>>> Cell: 216.312.3203
>>>>
>>>> --
>>>> *******************************************
>>>> Laura Chihara
>>>> Professor of Mathematics   507-222-4065 (office)
>>>> Dept of Mathematics        507-222-4312 (fax)
>>>> Carleton College
>>>> 1 North College Street
>>>> Northfield MN 55057
>>>>
>>>> _______________________________________________
>>>> R-sig-teaching at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
>>>>
>>>
>>> --
>>> Albyn Jones
>>> Reed College
>>> jones at reed.edu
>>>
>>> _______________________________________________
>>> R-sig-teaching at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
>>
>>
>>
>
> _______________________________________________
> R-sig-teaching at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
>
>
>
> EBS European Business School gemeinnuetzige GmbH, Universitaet fuer Wirtschaft und Recht i.Gr. - Amtsgericht Wiesbaden HRB 19951 - Umsatzsteuer-ID DE 113891213 Geschaeftsfuehrung: Prof. Dr. Christopher Jahns,  President; Prof. Dr. Rolf Tilmes, Dean Business School; Sabine Fuchs, CMO; Prof. Dr. Dr. Gerrick Frhr. v. Hoyningen-Huene, Dean Law School; Verwaltungsrat: Dr. Hellmut K. Albrecht, Vorsitzender
>        [[alternative HTML version deleted]]
>
>
> _______________________________________________
> R-sig-teaching at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
>
>