[Rd] Feature request: Have t.test return (group-wise) SD and N

Sat Oct 20 08:50:43 CEST 2018

Dear readers,

Lm returns all information necessary to reconstruct summary statistics by group. However, t.test only returns the group means, and not the group SDs, or even the group Ns. These cannot be reconstructed from the test statistic and df, because the df are already pooled, except under a very strict assumption of equality of groups and variances.

I need these summary statistics for a package that performs Bayesian inference for frequentist analyses, through normal approximation of the posterior. To make the package as user friendly as possible, I would like it to have S3 methods for commonly used frequentist analyses in R, such as: lm and t.test.

As per the R-project feature request guidelines, I would like to gauge how people feel about adding functionality to t.test, so that it will return either the model data, as lm() does, or full summary statistics (per group)? I can mask the t.test function with an enhanced version, but I feel like there is value in making this functionality available to all.

Thank you sincerely for your input. Below is a reproducible example, illustrating the problem.

Best,
Caspar

d <- structure(list(Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4, 1.7,
                                     1.4, 1.5, 1.4, 1.5, 1.5, 1.6, 1.4, 1.1, 1.2, 1.5, 1.3, 1.4, 1.7,
                                     1.5, 1.7, 1.5, 1, 1.7, 1.9, 1.6, 1.6, 1.5, 1.4, 1.6, 1.6, 1.5,
                                     1.5, 1.4, 1.5, 1.2, 1.3, 1.4, 1.3, 1.5, 1.3, 1.3, 1.3, 1.6, 1.9,
                                     1.4, 1.6, 1.4, 1.5, 1.4, 4.7, 4.5, 4.9, 4, 4.6, 4.5, 4.7, 3.3,
                                     4.6, 3.9, 3.5, 4.2, 4, 4.7, 3.6, 4.4, 4.5, 4.1, 4.5, 3.9, 4.8,
                                     4, 4.9, 4.7, 4.3, 4.4, 4.8, 5, 4.5, 3.5, 3.8, 3.7, 3.9, 5.1,
                                     4.5, 4.5, 4.7, 4.4, 4.1, 4, 4.4, 4.6, 4, 3.3, 4.2, 4.2, 4.2,
                                     4.3, 3, 4.1),
                    Species = structure(rep(c(1L, 2L), each = 50), .Label = c("setosa", "versicolor"), class = "factor")),
                    row.names = c(NA, 100L), class = "data.frame")
# lm model
m_lm <- lm(Petal.Length ~ Species, d)
# Extract group means:
aggregate(m_lm$model$Petal.Length, list(m_lm$model$Species), mean)
# Extract group SDs:
aggregate(m_lm$model$Petal.Length, list(m_lm$model$Species), sd)
# Extract group Ns:
table(m_lm$model$Species)

# t.test model
m_t <- t.test(d$Petal.Length[1:50], d$Petal.Length[51:100], var.equal = TRUE)
# Extract group means:
m_t$estimate
# Extract group SDs:
# Not available
# Extract group Ns:
# Not available

Dr. Caspar J. van Lissa
Assistant professor of developmental data science
Utrecht University, dept. Methodology & Statistics
Sjoerd Groenmangebouw C1.01, 3584CH Utrecht, the Netherlands. Secretariat: +31 30 253 4438

	[[alternative HTML version deleted]]