[R] Bug in t.test?
Johannes W. Dietrich
j.w.dietrich at medizinische-kybernetik.de
Sat Aug 14 00:44:15 CEST 2010
Thank you for the fast reply! Although I have read the help page for
t.test over and over again I have obviously overlooked the relevant
sentence. The "workaround" that I have planned seems to be the
correct use.
Thanks again,
J. W. D.
At 15:31 Uhr -0700 13.08.2010, Thomas Lumley wrote:
>Thanks for the clear example. However, if there is a bug it is only
>that t.test.formula doesn't throw an error when given the
>paired=TRUE option.
>
>The documentation says "The formula interface is only applicable for
>the 2-sample tests.", but there probably should be an explicit
>check -- I didn't see that in the documentation until I realized
>that t.test.formula couldn't work for paired tests because you don't
>tell it which observations are paired.
>
> -thomas
>
>
>On Fri, 13 Aug 2010, Johannes W. Dietrich wrote:
>
>>Hello all,
>>
>>due to unexplained differences between statistical results from
>>collaborators and our lab that arose in the same large proteomics
>>dataset we reevaluated the t.test() function. Here, we found a
>>weird behaviour that is also reproducible in the following small
>>test dataset:
>>
>>Suppose, we have two vectors with numbers and some missing values
>>that refer to the same individuals and that should therefore be
>>evaluated with a paired t-test:
>>
>>> testdata.A <- c(1.15, -0.2, NA, 1, -2, -0.5, 0.1, 1.2, -1.4, 0.01);
>>> testdata.B <- c(1.2, 1.1, 3, -0.1, 3, 1.1, 0, 1.3, 4, NA);
>>
>>Then
>>
>>> print(t.test(testdata.A, testdata.B, paired=TRUE,
>>>alternative="two.sided", na.action="na.pass"))
>>
>>and
>>
>>> print(t.test(testdata.A, testdata.B, paired=TRUE,
>>>alternative="two.sided", na.action="na.exclude"))
>>
>>deliver the same p value (0.1162, identical to Excel's result).
>>
>>However, after combining the two vectors with
>>
>>> testdata <- c(testdata.A, testdata.B);
>>
>>and defining a criterion vector with
>>
>>> criterion <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1);
>>
>>(that is the type of data layout we have in our proteomics project)
>>we get a different p-value (0.01453) with
>>
>>> print(t.test(testdata ~ criterion, paired=TRUE,
>>>alternative="two.sided", na.action="na.exclude")) .
>>
>>The statement
>>
>>> print(t.test(testdata ~ criterion, paired=TRUE,
>>>alternative="two.sided", na.action="na.pass"))
>>
>>however, delivers a p-value of 0.1162 again.
>>
>>With
>>
>>> print(t.test(testdata[criterion==0], testdata[criterion==1],
>>>paired=TRUE, alternative="two.sided", na.action="na.exclude"))
>>
>>that imitates the first form, we get again a p value of 0.1162.
>>
>>What is the reason for the different p values? Should not all calls
>>to t.test that exlude missing values be equivalent and therefore
>>deliver the same results?
>>
>>Excel, StatView and KaleidaGraph all display a p-value of 0.1162.
>>
>>J. W. D.
>>--
>>-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
>>-- Dr. Johannes W. Dietrich, M.D.
>>-- Laboratory XU44, Endocrine Research
>>-- Medical Hospital I, Bergmannsheil University Hospitals
>>-- Ruhr University of Bochum
>>-- Buerkle-de-la-Camp-Platz 1, D-44789 Bochum, NRW, Germany
>>-- Phone: +49:234:302-6400, Fax: +49:234:302-6403
>>-- eMail: "j.w.dietrich at medical-cybernetics.de"
>>-- WWW: http://medical-cybernetics.de
>>-- WWW: http://www.bergmannsheil.de
>>-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>>
>
>Thomas Lumley
>Professor of Biostatistics
>University of Washington, Seattle
--
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- Dr. Johannes W. Dietrich, M.D.
-- Laboratory XU44, Endocrine Research
-- Medical Hospital I, Bergmannsheil University Hospitals
-- Ruhr University of Bochum
-- Buerkle-de-la-Camp-Platz 1, D-44789 Bochum, NRW, Germany
-- Phone: +49:234:302-6400, Fax: +49:234:302-6403
-- eMail: "j.w.dietrich at medical-cybernetics.de"
-- WWW: http://medical-cybernetics.de
-- WWW: http://www.bergmannsheil.de
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
More information about the R-help
mailing list