[R] lm equivalent of Welch-corrected t-test?

Wed Nov 14 13:56:03 CET 2018

> On 13 Nov 2018, at 16:19 , Paul Johnson <pauljohn32 using gmail.com> wrote:
> 
> Long ago, when R's t.test had var.equal=TRUE by default, I wrote some
> class notes showing that the result was equivalent to a one predictor
> regression model.  Because t.test does not default to var.equal=TRUE
> these days, I'm curious to know if there is a way to specify weights
> in an lm to obtain the same result as the Welch-adjusted values
> reported by t.test at the current time.  Is there a WLS equivalent
> adjustment with lm?
> 

The short answer is no. The long answer is to look into heteroscedasticity adjustments or lmer combined with pbkrtest. 

Well, you can do the weights in lm() and get the right t statistic, but not the Satterthwaite oddball-df thing (the sleep data are actually paired, but ignore that here)

variances <- tapply(sleep$extra, sleep$group, var)
sleep$wt <- 1/variances[sleep$group]
t.test(extra~group, sleep)
summary(lm(extra~factor(group), weight=wt, sleep))

The pbkrtest approach goes like this:

library(lme4)
library(pbkrtest)
sleep$gdummy <- sleep$group-1 # arrange that this is 1 in the group with larger variance
fit1 <-  lmer(extra ~ group + (gdummy+0 | ID), sleep)
fit0 <-  lmer(extra ~ 1 + (gdummy+0 | ID), sleep)
KRmodcomp(fit0, fit1)

(This somewhat abuses the existing sleep$ID -- all you really need is something that has a level for each record where gdummy==1)

This ends up with 

         stat     ndf     ddf F.scaling p.value  
Ftest  3.4626  1.0000 17.7765         1 0.07939 .

to be compared with the Welch test

t = -1.8608, df = 17.776, p-value = 0.07939

-pd 

> Here's example code to show that lm is same as t.test with var.equal.
> The signs come out differently, but otherwise the effect estimate,
> standard error, t value are same:
> 
> 
> set.seed(234234)
> dat <- data.frame(x = gl(2, 50, labels = c("F", "M")))
> dat$err <- rnorm(100, 0, 1)
> dat$y <- ifelse(dat$x == "F", 40 + dat$err, 44 + dat$err)
> m1 <- lm(y ~ x, dat)
> summary(m1)
> m1.t <- t.test(y ~ x, dat, var.equal = TRUE)
> m1.t
> ## diff matches regression coefficient
> (m1.t.effect <- diff(m1.t$estimate))
> ## standard error matches regression se
> m1.t.stderr <- m1.t.effect/m1.t$statistic
> 
> If you run that, you see lm output:
> 
> Coefficients:
>            Estimate Std. Error t value Pr(>|t|)
> (Intercept)  39.9456     0.1180  338.65   <2e-16 ***
> xM            3.9080     0.1668   23.43   <2e-16 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> Residual standard error: 0.8341 on 98 degrees of freedom
> Multiple R-squared:  0.8485,    Adjusted R-squared:  0.8469
> F-statistic: 548.8 on 1 and 98 DF,  p-value: < 2.2e-16
> 
> and t.test:
> 
>> m1.t <- t.test(y ~ x, dat, var.equal = TRUE)
>> m1.t
> 
>    Two Sample t-test
> 
> data:  y by x
> t = -23.427, df = 98, p-value < 2.2e-16
> alternative hypothesis: true difference in means is not equal to 0
> 95 percent confidence interval:
> -4.239038 -3.576968
> sample estimates:
> mean in group F mean in group M
>       39.94558        43.85358
> 
>> (m1.t.effect <- diff(m1.t$estimate))
> mean in group M
>       3.908003
>> m1.t.effect/m1.t$statistic
> mean in group M
>     -0.1668129
> 
> 
> 
> 
> -- 
> Paul E. Johnson   http://pj.freefaculty.org
> Director, Center for Research Methods and Data Analysis http://crmda.ku.edu
> 
> To write to me directly, please address me at pauljohn at ku.edu.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com