[R] Bug in t.test?

Thomas Lumley tlumley at u.washington.edu
Sat Aug 14 00:31:45 CEST 2010



Thanks for the clear example. However, if there is a bug it is only that t.test.formula doesn't throw an error when given the paired=TRUE option.

The documentation says "The formula interface is only applicable for the 2-sample tests.",  but there probably should be an explicit check -- I didn't see that in the documentation until I realized that t.test.formula couldn't work for paired tests because you don't tell it which observations are paired.

    -thomas


On Fri, 13 Aug 2010, Johannes W. Dietrich wrote:

> Hello all,
>
> due to unexplained differences between statistical results from collaborators 
> and our lab that arose in the same large proteomics dataset we reevaluated 
> the t.test() function. Here, we found a weird behaviour that is also 
> reproducible in the following small test dataset:
>
> Suppose, we have two vectors with numbers and some missing values that refer 
> to the same individuals and that should therefore be evaluated with a paired 
> t-test:
>
>>  testdata.A <- c(1.15, -0.2, NA, 1, -2, -0.5, 0.1, 1.2, -1.4, 0.01);
>>  testdata.B <- c(1.2, 1.1, 3, -0.1, 3, 1.1, 0, 1.3, 4, NA);
>
> Then
>
>>  print(t.test(testdata.A, testdata.B, paired=TRUE, alternative="two.sided", 
>> na.action="na.pass"))
>
> and
>
>>  print(t.test(testdata.A, testdata.B, paired=TRUE, alternative="two.sided", 
>> na.action="na.exclude"))
>
> deliver the same p value (0.1162, identical to Excel's result).
>
> However, after combining the two vectors with
>
>>  testdata <- c(testdata.A, testdata.B);
>
> and defining a criterion vector with
>
>>  criterion <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1);
>
> (that is the type of data layout we have in our proteomics project) we get a 
> different p-value (0.01453) with
>
>>  print(t.test(testdata ~ criterion, paired=TRUE, alternative="two.sided", 
>> na.action="na.exclude")) .
>
> The statement
>
>>  print(t.test(testdata ~ criterion, paired=TRUE, alternative="two.sided", 
>> na.action="na.pass"))
>
> however, delivers a p-value of 0.1162 again.
>
> With
>
>>  print(t.test(testdata[criterion==0], testdata[criterion==1], paired=TRUE, 
>> alternative="two.sided", na.action="na.exclude"))
>
> that imitates the first form, we get again a p value of 0.1162.
>
> What is the reason for the different p values? Should not all calls to t.test 
> that exlude missing values be equivalent and therefore deliver the same 
> results?
>
> Excel, StatView and KaleidaGraph all display a p-value of 0.1162.
>
> J. W. D.
> -- 
> -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
> -- Dr. Johannes W. Dietrich, M.D.
> -- Laboratory XU44, Endocrine Research
> -- Medical Hospital I, Bergmannsheil University Hospitals
> -- Ruhr University of Bochum
> -- Buerkle-de-la-Camp-Platz 1, D-44789 Bochum, NRW, Germany
> -- Phone: +49:234:302-6400, Fax: +49:234:302-6403
> -- eMail: "j.w.dietrich at medical-cybernetics.de"
> -- WWW: http://medical-cybernetics.de
> -- WWW: http://www.bergmannsheil.de
> -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle



More information about the R-help mailing list