[R] about a p-value < 2.2e-16

Jiefei Wang @zwj|08 @end|ng |rom gm@||@com
Fri Mar 19 15:52:29 CET 2021


After digging into the R source, it turns out that the argument `exact` has
nothing to do with the numeric precision. It only affects the statistic
model used to compute the p-value. When `exact=TRUE` the true distribution
of the statistic will be used. Otherwise, a normal approximation will be
used.

I think the documentation needs to be improved here, you can compute the
exact p-value *only* when you do not have any ties in your data. If you
have ties in your data you will get the p-value from the normal
approximation no matter what value you put in `exact`. This behavior should
be documented or a warning should be given when `exact=TRUE` and ties
present.

FYI, if the exact p-value is required, `pwilcox` function will be used to
compute the p-value. There are no details on how it computes the pvalue but
its C code seems to compute the probability table, so I assume it computes
the exact p-value from the true distribution of the statistic, not a
permutation or MC p-value.

Best,
Jiefei



On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwjf08 using gmail.com> wrote:

> Hey,
>
> I just want to point out that the word "exact" has two meanings. It can
> mean the numerically accurate p-value as Bogdan asked in his first email,
> or it could mean the p-value calculated from the exact distribution of the
> statistic(In this case, U stat). These two are actually not related, even
> though they all called "exact".
>
> Best,
> Jiefei
>
> On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves <
> spencer.graves using effectivedefense.org> wrote:
>
>>
>>
>> On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
>> > thanks a lot, Vivek ! in other words, assuming that we work with 1000
>> data
>> > points,
>> >
>> > shall we use EXACT = TRUE, it uses the normal approximation,
>> >
>> > while if EXACT=FALSE (for these large samples), it does not ?
>>
>>
>>        As David Winsemius noted, the documentation is not clear.
>> Consider the following:
>>
>> > set.seed(1)  > x <- rnorm(100) > y <- rnorm(100, 2) > > wilcox.test(x,
>> y)$p.value
>> [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > >
>> wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x,
>> y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>> exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y,
>> exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y,
>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>> exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
>> exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here:
>> 1.172189e-25 and 4.123875e-32. The first one, I think, is the normal
>> approximation, which is the same as exact=FALSE. I think that with
>> exact=FALSE, you get a permutation distribution, though I'm not sure.
>> You might try looking at "wilcox_test in package coin for exact,
>> asymptotic and Monte Carlo conditional p-values, including in the
>> presence of ties" to see if it is clearer. NOTE: R is case sensitive, so
>> "EXACT" is a different variable from "exact". It is interpreted as an
>> optional argument, which is not recognized and therefore ignored in this
>> context.
>>           Hope this helps.
>>           Spencer
>>
>>
>> > On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mmind using gmail.com> wrote:
>> >
>> >> Hi Bogdan,
>> >>
>> >> You can also get the information from the link of the Wilcox.test
>> function
>> >> page.
>> >>
>> >> “By default (if exact is not specified), an exact p-value is computed
>> if
>> >> the samples contain less than 50 finite values and there are no ties.
>> >> Otherwise, a normal approximation is used.”
>> >>
>> >> For more:
>> >>
>> >>
>> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html
>> >>
>> >> Hope this helps!
>> >>
>> >> Best,
>> >>
>> >> VD
>> >>
>> >>
>> >> On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tanasa using gmail.com>
>> wrote:
>> >>
>> >>> Dear Peter, thanks a lot. yes, we can see a very precise p-value, and
>> that
>> >>> was the request from the journal.
>> >>>
>> >>> if I may ask another question please : what is the meaning of
>> "exact=TRUE"
>> >>> or "exact=FALSE" in wilcox.test ?
>> >>>
>> >>> i can see that the "numerically precise" p-values are different.
>> thanks a
>> >>> lot !
>> >>>
>> >>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>> >>> tst$p.value
>> >>> [1] 8.535524e-25
>> >>>
>> >>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE)
>> >>> tst$p.value
>> >>> [1] 3.448211e-25
>> >>>
>> >>> On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder <
>> >>> peter.langfelder using gmail.com> wrote:
>> >>>
>> >>>> I thinnk the answer is much simpler. The print method for hypothesis
>> >>>> tests (class htest) truncates the p-values. In the above example,
>> >>>> instead of using
>> >>>>
>> >>>> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>> >>>>
>> >>>> and copying the output, just print the p-value:
>> >>>>
>> >>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>> >>>> tst$p.value
>> >>>>
>> >>>> [1] 2.988368e-32
>> >>>>
>> >>>>
>> >>>> I think this value is what the journal asks for.
>> >>>>
>> >>>> HTH,
>> >>>>
>> >>>> Peter
>> >>>>
>> >>>> On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves
>> >>>> <spencer.graves using effectivedefense.org> wrote:
>> >>>>>         I would push back on that from two perspectives:
>> >>>>>
>> >>>>>
>> >>>>>               1.  I would study exactly what the journal said very
>> >>>>> carefully.  If they mandated "wilcox.test", that function has an
>> >>>>> argument called "exact".  If that's what they are asking, then using
>> >>>>> that argument gives the exact p-value, e.g.:
>> >>>>>
>> >>>>>
>> >>>>>   > wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>> >>>>>
>> >>>>>           Wilcoxon rank sum exact test
>> >>>>>
>> >>>>> data:  rnorm(100) and rnorm(100, 2)
>> >>>>> W = 691, p-value < 2.2e-16
>> >>>>>
>> >>>>>
>> >>>>>               2.  If that's NOT what they are asking, then I'm not
>> >>>>> convinced what they are asking makes sense:  There is is no such
>> thing
>> >>>>> as an "exact p value" except to the extent that certain assumptions
>> >>>>> hold, and all models are wrong (but some are useful), as George Box
>> >>>>> famously said years ago.[1]  Truth only exists in mathematics, and
>> >>>>> that's because it's a fiction to start with ;-)
>> >>>>>
>> >>>>>
>> >>>>>         Hope this helps.
>> >>>>>         Spencer Graves
>> >>>>>
>> >>>>>
>> >>>>> [1]
>> >>>>> https://en.wikipedia.org/wiki/All_models_are_wrong
>> >>>>>
>> >>>>>
>> >>>>> On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
>> >>>>>>    <
>> >>>>
>> https://meta.stackexchange.com/questions/362285/about-a-p-value-2-2e-16
>> >>>>
>> >>>>>> Dear all,
>> >>>>>>
>> >>>>>> i would appreciate having your advice on the following please :
>> >>>>>>
>> >>>>>> in R, the wilcox.test() provides "a p-value < 2.2e-16", when we
>> >>> compare
>> >>>>>> sets of 1000 genes expression (in the genomics field).
>> >>>>>>
>> >>>>>> however, the journal asks us to provide the exact p value ...
>> >>>>>>
>> >>>>>> would it be legitimate to write : "p-value = 0" ? thanks a lot,
>> >>>>>>
>> >>>>>> -- bogdan
>> >>>>>>
>> >>>>>>        [[alternative HTML version deleted]]
>> >>>>>>
>> >>>>>> ______________________________________________
>> >>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>>>>> PLEASE do read the posting guide
>> >>>> http://www.R-project.org/posting-guide.html
>> >>>>>> and provide commented, minimal, self-contained, reproducible code.
>> >>>>> ______________________________________________
>> >>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>>>> PLEASE do read the posting guide
>> >>>> http://www.R-project.org/posting-guide.html
>> >>>>> and provide commented, minimal, self-contained, reproducible code.
>> >>>          [[alternative HTML version deleted]]
>> >>>
>> >>> ______________________________________________
>> >>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> PLEASE do read the posting guide
>> >>> http://www.R-project.org/posting-guide.html
>> >>> and provide commented, minimal, self-contained, reproducible code.
>> >>>
>> >> --
>> >> ----------------------------------------------------------
>> >>
>> >> Vivek Das, PhD
>> >>
>> >       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list