[R] t.test with Welch correction is ambiguous

Mon Nov 27 16:22:03 CET 2023

Your solution was educational. Thank you. I have two comments.
1) If you do not provide both options then you are forcing people to conform to your approach. In general I disapprove, but for specific cases I can see advantages.
2) Without reading the relevant papers (and possibly understanding them) is there a simple metric that would enable the correct choice between Welch-Shatterthwaite and Welch (1947)?
3) If there is a broad consensus that Welch (1947) is never the correct option then do not implement it.

As written, it sounds like Welch (1938) proposed a correction. Welch published another correction in 1947, but then retracted his 1947 correction in a 1949 paper. At least that is how I interpret what was written in your option c.

Tim
-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Dr. Rainer Düsing
Sent: Monday, November 27, 2023 7:42 AM
To: r-help using r-project.org
Subject: [R] t.test with Welch correction is ambiguous

[External Email]

Dear R Team!

There was an ongoing debate on Research Gate about the "Welch" option in your base R t.test command. A user noticed that the correction of the degrees of freedom labeled as "Welch Two Sample t-test", if you choose var.equal = TRUE in the R t.test command, differs from the output of the Stata analysis, which is also labeled as "Welch's degrees of freedom".  Confusingly enough, the R output coincided with the Stata result labeled as "Satterthwaite's degrees of freedom". Unfortunately, the R documentation wasn't clear either, since it lacks any specific reference and the formulation is
ambiguous: "If TRUE then the pooled variance is used to estimate the variance otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used." It rather sounds as if both options are available and not that both authors proposed the same correction separately.

After doing some research and looking into the R code, we found a solution and would like to suggest an update to the R documentation, to make it more clear (you can find the similar proposal to the Stata list here:
https://www.statalist.org/forums/forum/general-stata-discussion/general/1734987-unequal-vs-welch-options-for-ttest-why-no-mention-of-welch-1938-in-the-documentation
)

What is called "Welch Two Sample t-test" in the t.test command refers to two publications (see links below) with the same correction, namely Welch
(1938) and Satterthwaite (1946). Hence, you also find "Welch-Satterthwaite"
correction as a description in the literature for this (which is the aforementioned "Satterthwaite's degrees of freedom" correction in Stata).
But there is also another correction proposed by Welch (1947), which has slightly different denominators (see code below), which is called "Welch's degrees of freedom" in Stata. This option is not available in R so far.

Therefore, we suggest a) to cite the appropriate references in the documentation (at least Welch (1938) and Satterthwaite (1946)), b) adapt the output to something like "Welch-Satterthwaite adjusted Two Sample t-test" and maybe c) to incorporate the third option for the Welch (1947) adjustment, where the Welch-Satterthwaite correction should be the default option (Aspin & Welch, 1949). Code proposal below for the df correction.

Best wishes,
Rainer Düsing

   1. ·  https://www.jstor.org/stable/2332010
   <https://www.researchgate.net/deref/https%3A%2F%2Fwww.jstor.org%2Fstable%2F2332010?_tp=eyJjb250ZXh0Ijp7ImZpcnN0UGFnZSI6InF1ZXN0aW9uIiwicGFnZSI6InF1ZXN0aW9uIiwicG9zaXRpb24iOiJwYWdlQ29udGVudCJ9fQ>
   (Welch, 1938)
   2. ·  https://www.jstor.org/stable/3002019
   <https://www.researchgate.net/deref/https%3A%2F%2Fwww.jstor.org%2Fstable%2F3002019?_tp=eyJjb250ZXh0Ijp7ImZpcnN0UGFnZSI6InF1ZXN0aW9uIiwicGFnZSI6InF1ZXN0aW9uIiwicG9zaXRpb24iOiJwYWdlQ29udGVudCJ9fQ>
   (Satterthwaite, 1946)
   3. ·  https://www.jstor.org/stable/2332510
   <https://www.researchgate.net/deref/https%3A%2F%2Fwww.jstor.org%2Fstable%2F2332510?_tp=eyJjb250ZXh0Ijp7ImZpcnN0UGFnZSI6InF1ZXN0aW9uIiwicGFnZSI6InF1ZXN0aW9uIiwicG9zaXRpb24iOiJwYWdlQ29udGVudCJ9fQ>
   (Welch, 1947)
   4. ·  Aspin, Alice A., and B. L. Welch. "Tables for Use in Comparisons
   Whose Accuracy Involves Two Variances, Separately Estimated."
   *Biometrika* 36, no. 3/4 (1949): 290-96. https://doi.org/10.2307/2332668
   <https://www.researchgate.net/deref/https%3A%2F%2Fdoi.org%2F10.2307%2F2332668?_tp=eyJjb250ZXh0Ijp7ImZpcnN0UGFnZSI6InF1ZXN0aW9uIiwicGFnZSI6InF1ZXN0aW9uIiwicG9zaXRpb24iOiJwYWdlQ29udGVudCJ9fQ>.
   [see point 4 in the Appendix by Welch]

var.equal = "yes"

var.equal = "Welch"

var.equal = "W-S"

vx <- var(x)

nx <- length(x)

vy <- var(y)

ny <- length(y)

if (var.equal == "yes") {

  df <- nx + ny - 2

  v <- 0

  if (nx > 1)

    v <- v + (nx - 1) * vx

  if (ny > 1)

    v <- v + (ny - 1) * vy

  v <- v/df

  stderr <- sqrt(v * (1/nx + 1/ny))

} else if (var.equal == "Welch") {

  stderrx <- sqrt(vx/nx)

  stderry <- sqrt(vy/ny)

  stderr <- sqrt(stderrx^2 + stderry^2)

  df <- -2+(stderr^4/(stderrx^4/(nx + 1) + stderry^4/(ny +1)))

} else {

  stderrx <- sqrt(vx/nx)

  stderry <- sqrt(vy/ny)

  stderr <- sqrt(stderrx^2 + stderry^2)

  df <- stderr^4/(stderrx^4/(nx - 1) + stderry^4/(ny -1))

}

--
*Dr. rer. nat. Rainer Düsing, Dipl.-Psych. * Universität Osnabrück Institut für Psychologie Fachgebiet Forschungsmethodik, Diagnostik und Evaluation Lise-Meitner-Str. 3
49076 Osnabrück

Raum 75/222
Tel: +49-541 969 7734
Email: raduesing using uos.de <rduesing using uos.de>

        [[alternative HTML version deleted]]

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.