[R] negative P-values with shapiro.test

Martin Maechler maechler at stat.math.ethz.ch
Wed Jul 16 18:02:47 CEST 2008


>>>>> "MC" == Mark Cowley <m.cowley at garvan.org.au>
>>>>>     on Wed, 16 Jul 2008 15:32:30 +1000 writes:

    MC> Dear list,

    MC> I am analysing a set of quantitative proteomics data
    MC> from 16 patients which has a large numbers of missing
    MC> data, thus some proteins are only detected once, upto a
    MC> maximum of 16.  I want to test each protein for
    MC> normality by the Shapiro Wilk test (function
    MC> shapiro.test in package stats), which can only be
    MC> applied to data with at least 3 measurements, which is
    MC> fine. In the case where I have only 3 observations, and
    MC> two of those observations are identical, then the
    MC> shapiro.test produces negative P-values, which should
    MC> never happen.  This occurs for all of the situations
    MC> that I have tried for 3 values, where 2 are the same.


Yes. Since all such tests are location- and scale-invariant, you
can reproduce it with

shapiro.test(c(0,0,1))

The irony is that the original papers by Roydon and the R help
page all assert that the P-value for  n = 3  is exact !

OTOH, the paper [Roydon (1982), Appl.Stat 31, p115-124]
clearly states that 
	X(1) < X(2) < X(3) ... < X(n)

i.e., does not allow "ties" (two equal values).

If the exact formula in the paper were evaluated exactly 
(instead with a rounded value of about 6 digits),
the "exact P-value" would be exactly 0.

Now that would count as a bug in the paper I think.
More about this tomorrow or so.

Martin Maechler, ETH Zurich


    MC> Reproducible code below:
    MC> # these are the data points that raised the problem
    >> shapiro.test(c(-0.644, 0.0566, 0.0566))

    MC> Shapiro-Wilk normality test

    MC> data:  c(-0.644, 0.0566, 0.0566)
    MC> W = 0.75, p-value < 2.2e-16

    >> shapiro.test(c(-0.644, 0.0566, 0.0566))$p.value
    MC> [1] -7.69e-07
    MC> # note the verbose output shows a small, but positive P-value, but  
    MC> when you extract that P using $p.value, it becomes negative
    MC> # various other tests
    >> shapiro.test(c(1,1,2))$p.value
    MC> [1] -8.35e-07
    >> shapiro.test(c(-1,-1,2))$p.value
    MC> [1] -1.03e-06

    MC> cheers,

    MC> Mark

    >> sessionInfo()
    MC> R version 2.6.1 (2007-11-26)
    MC> i386-apple-darwin8.10.1

    MC> locale:
    MC> en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

    MC> attached base packages:
    MC> [1] tcltk     graphics  grDevices datasets  utils     stats      
    MC> methods   base

    MC> other attached packages:
    MC> [1] qvalue_1.12.0    Cairo_1.3-5      RSvgDevice_0.6.3  
    MC> SparseM_0.74     pwbc_0.1
    MC> [6] mjcdev_0.1       tigrmev_0.1      slfa_0.1          
    MC> sage_0.1         qtlreaper_0.1
    MC> [11] pajek_0.1        mjcstats_0.1     mjcspot_0.1       
    MC> mjcgraphics_0.1  mjcaffy_0.1
    MC> [16] haselst_0.1      geomi_0.1        geo_0.1           
    MC> genomics_0.1     cor_0.1
    MC> [21] bootstrap_0.1    blat_0.1         bitops_1.0-4      
    MC> mjcbase_0.1      gdata_2.3.1
    MC> [26] gtools_2.4.0

    MC> -----------------------------------------------------
    MC> Mark Cowley, BSc (Bioinformatics)(Hons)

    MC> Peter Wills Bioinformatics Centre
    MC> Garvan Institute of Medical Research, Sydney, Australia

    MC> ______________________________________________
    MC> R-help at r-project.org mailing list
    MC> https://stat.ethz.ch/mailman/listinfo/r-help
    MC> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    MC> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list