[R] Paired t-tests

Mon Aug 16 01:19:18 CEST 2010

On Aug 15, 2010, at 2:48 PM, David Winsemius wrote:

> 
> On Aug 15, 2010, at 3:31 PM, Peter Dalgaard wrote:
> 
>> Marc Schwartz wrote:
>>> On Aug 15, 2010, at 9:05 AM, R Help wrote:
>>> 
>>>> Hello List,
>>>> 
>>>> I'm trying to do a paired t-test, and I'm wondering if it's consistent
>>>> with equations.  I have a dataset that has a response and two
>>>> treatments (here's an example):
>>>> 
>>>> ID trt order          resp
>>>> 17  1   0     1  0.0037513592
>>>> 18  2   0     1  0.0118723051
>>>> 19  4   0     1  0.0002610251
>>>> 20  5   0     1 -0.0077951450
>>>> 21  6   0     1  0.0022339952
>>>> 22  7   0     2  0.0235195453
>>>> 
>>>> The subjects were randomized and assigned to receive either the
>>>> treatment or the placebo first, then the other.  I know I'll
>>>> eventually have to move on to a GLM or something that incorporates the
>>>> order, but for now I wanted to start with a simple t.test.  My problem
>>>> is that, if I get the responses into two vectors x and y (sorted by
>>>> ID) and do a t.test, and then compare that to a formula t.test, they
>>>> aren't the same.
>>>> 
>>>>> t.test(x,y,paired=TRUE)
>>>> 	Paired t-test
>>>> 
>>>> data:  x and y
>>>> t = -0.3492, df = 15, p-value = 0.7318
>>>> alternative hypothesis: true difference in means is not equal to 0
>>>> 95 percent confidence interval:
>>>> -0.010446921  0.007505966
>>>> sample estimates:
>>>> mean of the differences
>>>>         -0.001470477
>>>> 
>>>>> t.test(resp~trt,data=dat1[[3]],paired=TRUE)
> 
> Since neither resp or trt would be in dat1[[3]] wouldn't the fact that no error was reported imply that either dat1 had been attached (and we were not informed of hthat prior attach()-ment or that resp and trt are also object names besides being column names inside dat1?
> 
> 
>>>> 	Paired t-test
>>>> 
>>>> data:  resp by trt
>>>> t = -0.3182, df = 15, p-value = 0.7547
>>>> alternative hypothesis: true difference in means is not equal to 0
>>>> 95 percent confidence interval:
>>>> -0.007096678  0.005253173
>>>> sample estimates:
>>>> mean of the differences
>>>>        -0.0009217521
>>>> 
>>>> What I'm assuming is that the equation isn't retaining the inherent
>>>> order of the dataset, so the pairing isn't matching up (even though
>>>> the dataset is ordered by ID).  Is there a way to make the t.test
>>>> retain the correct ordering?
>>>> 
>>>> Thanks,
>>>> Sam
>>> 
>>> 
>>> See this thread from just 2 days ago:
>>> 
>>> https://stat.ethz.ch/pipermail/r-help/2010-August/249068.html
>>> 
>>> perhaps focusing on Thomas' reply, which is the next post in the thread.
>>> 
>>> Bottom line, don't use the formula method for a paired t test.
>> 
>> Yes. I'm not sure the same problem is afoot here, though. In particular,
>> I'm puzzled by the fact that there are 15DF in both cases, but different
>> average difference. This kind of suggests to me that maybe the x and y
>> are not computed correctly. (If only the ordering was scrambled, the
>> average difference should be the same, but the variance typically
>> inflated.)
>> 

I suspect that David is correct here. Good catch. 

set.seed(1)
x <- rnorm(16, 1, 1)
y <- rnorm(16, 1.5, 1)

grp <- rep(c("A", "B"), each = 16)

> t.test(x, y, paired = TRUE)

	Paired t-test

data:  x and y 
t = -1.595, df = 15, p-value = 0.1316
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -1.2841549  0.1848776 
sample estimates:
mean of the differences 
             -0.5496387 

> t.test(x-y)

	One Sample t-test

data:  x - y 
t = -1.595, df = 15, p-value = 0.1316
alternative hypothesis: true mean is not equal to 0 
95 percent confidence interval:
 -1.2841549  0.1848776 
sample estimates:
 mean of x 
-0.5496387 

> t.test(c(x, y) ~ grp, paired = TRUE)

	Paired t-test

data:  c(x, y) by grp 
t = -1.595, df = 15, p-value = 0.1316
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -1.2841549  0.1848776 
sample estimates:
mean of the differences 
             -0.5496387 

# Scramble the pairings, as Peter notes

set.seed(2)

> t.test(c(sample(x), y) ~ grp, paired = TRUE)

	Paired t-test

data:  c(sample(x), y) by grp 
t = -1.8166, df = 15, p-value = 0.0893
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -1.19453037  0.09525302 
sample estimates:
mean of the differences 
             -0.5496387 

The prior thread behavior was due to the handling of missing data compromising the pairings.

So to the OP, check your working environment and your invocation of the formula method 'data' argument. However, avoid using the formula method for paired t tests.

Regards,

Marc