[R] about a p-value < 2.2e-16
Jiefei Wang
@zwj|08 @end|ng |rom gm@||@com
Sat Mar 20 06:00:56 CET 2021
Hi Bogdan,
I think the journal is asking about the exact value of the pvalue, it
doesn't matter if it is from the exact distribution or normal
approximation. However, it does not make any sense to report such a small
pvlaue. If I was you, I would show the reviewers the exact pvalue they want
and gently explain why you did not put it into your paper. If they insist
that the number must be on the paper, then go ahead and do it.
Best,
Jiefei
Bogdan Tanasa <tanasa using gmail.com> 于 2021年3月20日周六 上午2:39写道:
> Thank you Kevin, their wording is "Please note that the exact p value
> should be provided, when possible, etc"
>
> by "exact p-value" i believe that they do mean indeed the actual number,
> and not to specify "exact=TRUE" ;
>
> as we are working with 1000 genes, shall i specify "exact=TRUE" on my PC,
> it runs out of memory ...
>
> wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value
>
> On Fri, Mar 19, 2021 at 11:10 AM Kevin Thorpe <kevin.thorpe using utoronto.ca>
> wrote:
>
> > I have to ask since. Are you sure the journal simply means by exact
> > p-value that they don’t want to see a p-value given as < 0.0001, for
> > example, and simply want the actual number?
> >
> > I cannot imagine they really meant exact as in the p-value from some
> exact
> > distribution.
> >
> > --
> > Kevin E. Thorpe
> > Head of Biostatistics, Applied Health Research Centre (AHRC)
> > Li Ka Shing Knowledge Institute of St. Michael's
> > Assistant Professor, Dalla Lana School of Public Health
> > University of Toronto
> > email: kevin.thorpe using utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016
> >
> > > On Mar 19, 2021, at 1:22 PM, Bogdan Tanasa <tanasa using gmail.com> wrote:
> > >
> > > EXTERNAL EMAIL:
> > >
> > > Dear all, thank you all for comments and help.
> > >
> > > as far as i can see, shall we have samples of 1000 records, only
> > > "exact=FALSE" allows the code to run:
> > >
> > > wilcox.test(rnorm(1000), rnorm(1000, 2), exact=FALSE)$p.value
> > > [1] 7.304863e-231
> > >
> > > shall i use "exact=TRUE", it runs out of memory on my 64GB RAM PC :
> > >
> > > wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value
> > > (the job is terminated by OS)
> > >
> > > shall you have any other suggestions, please let me know. thanks a lot
> !
> > >
> > > On Fri, Mar 19, 2021 at 9:05 AM Bert Gunter <bgunter.4567 using gmail.com>
> > wrote:
> > >
> > >> I **believe** -- if my old memory still serves-- that the "exact"
> > >> specification uses a home grown version of the algorithm to calculate
> > >> exact, or close approximations to the exact, permutation distribution
> > >> originally developed by Cyrus Mehta, founder of StatXact software. Of
> > >> course, examining the C code source would determine this, but I don't
> > care
> > >> to attempt this.
> > >>
> > >> If this is (no longer?) correct, please point this out.
> > >>
> > >> Best,
> > >>
> > >> Bert Gunter
> > >>
> > >> "The trouble with having an open mind is that people keep coming along
> > and
> > >> sticking things into it."
> > >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> > >>
> > >>
> > >> On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang <szwjf08 using gmail.com>
> wrote:
> > >>
> > >>> Hi Spencer,
> > >>>
> > >>> Thanks for your test results, I do not know the answer as I haven't
> > >>> used wilcox.test for many years. I do not know if it is possible to
> > >>> compute
> > >>> the exact distribution of the Wilcoxon rank sum statistic, but I
> think
> > it
> > >>> is very likely, as the document of `Wilcoxon` says:
> > >>>
> > >>> This distribution is obtained as follows. Let x and y be two random,
> > >>> independent samples of size m and n. Then the Wilcoxon rank sum
> > statistic
> > >>> is the number of all pairs (x[i], y[j]) for which y[j] is not greater
> > than
> > >>> x[i]. This statistic takes values between 0 and m * n, and its mean
> and
> > >>> variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively.
> > >>>
> > >>> As a nice feature of the non-parametric statistic, it is usually
> > >>> distribution-free so you can pick any distribution you like to
> compute
> > the
> > >>> same statistic. I wonder if this is the case, but I might be wrong.
> > >>>
> > >>> Cheers,
> > >>> Jiefei
> > >>>
> > >>>
> > >>> On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves <
> > >>> spencer.graves using effectivedefense.org> wrote:
> > >>>
> > >>>>
> > >>>>
> > >>>> On 2021-3-19 9:52 AM, Jiefei Wang wrote:
> > >>>>> After digging into the R source, it turns out that the argument
> > >>> `exact`
> > >>>> has
> > >>>>> nothing to do with the numeric precision. It only affects the
> > >>> statistic
> > >>>>> model used to compute the p-value. When `exact=TRUE` the true
> > >>>> distribution
> > >>>>> of the statistic will be used. Otherwise, a normal approximation
> will
> > >>> be
> > >>>>> used.
> > >>>>>
> > >>>>> I think the documentation needs to be improved here, you can
> compute
> > >>> the
> > >>>>> exact p-value *only* when you do not have any ties in your data. If
> > >>> you
> > >>>>> have ties in your data you will get the p-value from the normal
> > >>>>> approximation no matter what value you put in `exact`. This
> behavior
> > >>>> should
> > >>>>> be documented or a warning should be given when `exact=TRUE` and
> ties
> > >>>>> present.
> > >>>>>
> > >>>>> FYI, if the exact p-value is required, `pwilcox` function will be
> > >>> used to
> > >>>>> compute the p-value. There are no details on how it computes the
> > >>> pvalue
> > >>>> but
> > >>>>> its C code seems to compute the probability table, so I assume it
> > >>>> computes
> > >>>>> the exact p-value from the true distribution of the statistic, not
> a
> > >>>>> permutation or MC p-value.
> > >>>>
> > >>>>
> > >>>> My example shows that it does NOT use Monte Carlo, because
> > >>>> otherwise it uses some distribution. I believe the term "exact"
> means
> > >>>> that it uses the permutation distribution, though I could be
> mistaken.
> > >>>> If it's NOT a permutation distribution, I don't know what it is.
> > >>>>
> > >>>>
> > >>>> Spencer
> > >>>>>
> > >>>>> Best,
> > >>>>> Jiefei
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwjf08 using gmail.com>
> > >>> wrote:
> > >>>>>
> > >>>>>> Hey,
> > >>>>>>
> > >>>>>> I just want to point out that the word "exact" has two meanings.
> It
> > >>> can
> > >>>>>> mean the numerically accurate p-value as Bogdan asked in his first
> > >>>> email,
> > >>>>>> or it could mean the p-value calculated from the exact
> distribution
> > >>> of
> > >>>> the
> > >>>>>> statistic(In this case, U stat). These two are actually not
> related,
> > >>>> even
> > >>>>>> though they all called "exact".
> > >>>>>>
> > >>>>>> Best,
> > >>>>>> Jiefei
> > >>>>>>
> > >>>>>> On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves <
> > >>>>>> spencer.graves using effectivedefense.org> wrote:
> > >>>>>>
> > >>>>>>>
> > >>>>>>> On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
> > >>>>>>>> thanks a lot, Vivek ! in other words, assuming that we work with
> > >>> 1000
> > >>>>>>> data
> > >>>>>>>> points,
> > >>>>>>>>
> > >>>>>>>> shall we use EXACT = TRUE, it uses the normal approximation,
> > >>>>>>>>
> > >>>>>>>> while if EXACT=FALSE (for these large samples), it does not ?
> > >>>>>>>
> > >>>>>>> As David Winsemius noted, the documentation is not clear.
> > >>>>>>> Consider the following:
> > >>>>>>>
> > >>>>>>>> set.seed(1) > x <- rnorm(100) > y <- rnorm(100, 2) > >
> > >>> wilcox.test(x,
> > >>>>>>> y)$p.value
> > >>>>>>> [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > >
> > >>>>>>> wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 >
> > >>> wilcox.test(x,
> > >>>>>>> y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> > >>>>>>> exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y,
> > >>>>>>> exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y,
> > >>>>>>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> > >>>>>>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> > >>>>>>> exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> > >>>>>>> exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here:
> > >>>>>>> 1.172189e-25 and 4.123875e-32. The first one, I think, is the
> > normal
> > >>>>>>> approximation, which is the same as exact=FALSE. I think that
> with
> > >>>>>>> exact=FALSE, you get a permutation distribution, though I'm not
> > >>> sure.
> > >>>>>>> You might try looking at "wilcox_test in package coin for exact,
> > >>>>>>> asymptotic and Monte Carlo conditional p-values, including in the
> > >>>>>>> presence of ties" to see if it is clearer. NOTE: R is case
> > >>> sensitive,
> > >>>> so
> > >>>>>>> "EXACT" is a different variable from "exact". It is interpreted
> as
> > >>> an
> > >>>>>>> optional argument, which is not recognized and therefore ignored
> in
> > >>>> this
> > >>>>>>> context.
> > >>>>>>> Hope this helps.
> > >>>>>>> Spencer
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mmind using gmail.com>
> > >>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi Bogdan,
> > >>>>>>>>>
> > >>>>>>>>> You can also get the information from the link of the
> Wilcox.test
> > >>>>>>> function
> > >>>>>>>>> page.
> > >>>>>>>>>
> > >>>>>>>>> “By default (if exact is not specified), an exact p-value is
> > >>> computed
> > >>>>>>> if
> > >>>>>>>>> the samples contain less than 50 finite values and there are no
> > >>> ties.
> > >>>>>>>>> Otherwise, a normal approximation is used.”
> > >>>>>>>>>
> > >>>>>>>>> For more:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>
> > >>>
> >
> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html
> > >>>>>>>>> Hope this helps!
> > >>>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>>
> > >>>>>>>>> VD
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <
> tanasa using gmail.com
> > >
> > >>>>>>> wrote:
> > >>>>>>>>>> Dear Peter, thanks a lot. yes, we can see a very precise
> > p-value,
> > >>>> and
> > >>>>>>> that
> > >>>>>>>>>> was the request from the journal.
> > >>>>>>>>>>
> > >>>>>>>>>> if I may ask another question please : what is the meaning of
> > >>>>>>> "exact=TRUE"
> > >>>>>>>>>> or "exact=FALSE" in wilcox.test ?
> > >>>>>>>>>>
> > >>>>>>>>>> i can see that the "numerically precise" p-values are
> different.
> > >>>>>>> thanks a
> > >>>>>>>>>> lot !
> > >>>>>>>>>>
> > >>>>>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
> > >>>>>>>>>> tst$p.value
> > >>>>>>>>>> [1] 8.535524e-25
> > >>>>>>>>>>
> > >>>>>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE)
> > >>>>>>>>>> tst$p.value
> > >>>>>>>>>> [1] 3.448211e-25
> > >>>>>>>>>>
> > >>>>>>>>>> On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder <
> > >>>>>>>>>> peter.langfelder using gmail.com> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> I thinnk the answer is much simpler. The print method for
> > >>>> hypothesis
> > >>>>>>>>>>> tests (class htest) truncates the p-values. In the above
> > >>> example,
> > >>>>>>>>>>> instead of using
> > >>>>>>>>>>>
> > >>>>>>>>>>> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
> > >>>>>>>>>>>
> > >>>>>>>>>>> and copying the output, just print the p-value:
> > >>>>>>>>>>>
> > >>>>>>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
> > >>>>>>>>>>> tst$p.value
> > >>>>>>>>>>>
> > >>>>>>>>>>> [1] 2.988368e-32
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> I think this value is what the journal asks for.
> > >>>>>>>>>>>
> > >>>>>>>>>>> HTH,
> > >>>>>>>>>>>
> > >>>>>>>>>>> Peter
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves
> > >>>>>>>>>>> <spencer.graves using effectivedefense.org> wrote:
> > >>>>>>>>>>>> I would push back on that from two perspectives:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> 1. I would study exactly what the journal
> said
> > >>>> very
> > >>>>>>>>>>>> carefully. If they mandated "wilcox.test", that function
> has
> > >>> an
> > >>>>>>>>>>>> argument called "exact". If that's what they are asking,
> then
> > >>>> using
> > >>>>>>>>>>>> that argument gives the exact p-value, e.g.:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Wilcoxon rank sum exact test
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> data: rnorm(100) and rnorm(100, 2)
> > >>>>>>>>>>>> W = 691, p-value < 2.2e-16
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> 2. If that's NOT what they are asking, then
> I'm
> > >>>> not
> > >>>>>>>>>>>> convinced what they are asking makes sense: There is is no
> > >>> such
> > >>>>>>> thing
> > >>>>>>>>>>>> as an "exact p value" except to the extent that certain
> > >>>> assumptions
> > >>>>>>>>>>>> hold, and all models are wrong (but some are useful), as
> > George
> > >>>> Box
> > >>>>>>>>>>>> famously said years ago.[1] Truth only exists in
> mathematics,
> > >>> and
> > >>>>>>>>>>>> that's because it's a fiction to start with ;-)
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Hope this helps.
> > >>>>>>>>>>>> Spencer Graves
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> [1]
> > >>>>>>>>>>>> https://en.wikipedia.org/wiki/All_models_are_wrong
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
> > >>>>>>>>>>>>> <
> > >>>>>>>
> > >>>>
> > https://meta.stackexchange.com/questions/362285/about-a-p-value-2-2e-16
> > >>>>>>>>>>>>> Dear all,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> i would appreciate having your advice on the following
> please
> > >>> :
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> in R, the wilcox.test() provides "a p-value < 2.2e-16",
> when
> > >>> we
> > >>>>>>>>>> compare
> > >>>>>>>>>>>>> sets of 1000 genes expression (in the genomics field).
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> however, the journal asks us to provide the exact p value
> ...
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> would it be legitimate to write : "p-value = 0" ? thanks a
> > >>> lot,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> -- bogdan
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> [[alternative HTML version deleted]]
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> ______________________________________________
> > >>>>>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and
> > more,
> > >>>> see
> > >>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>>>>>>>>>>>> PLEASE do read the posting guide
> > >>>>>>>>>>> http://www.R-project.org/posting-guide.html
> > >>>>>>>>>>>>> and provide commented, minimal, self-contained,
> reproducible
> > >>>> code.
> > >>>>>>>>>>>> ______________________________________________
> > >>>>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and
> more,
> > >>> see
> > >>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>>>>>>>>>>> PLEASE do read the posting guide
> > >>>>>>>>>>> http://www.R-project.org/posting-guide.html
> > >>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
> > >>> code.
> > >>>>>>>>>> [[alternative HTML version deleted]]
> > >>>>>>>>>>
> > >>>>>>>>>> ______________________________________________
> > >>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
> > >>> see
> > >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>>>>>>>>> PLEASE do read the posting guide
> > >>>>>>>>>> http://www.R-project.org/posting-guide.html
> > >>>>>>>>>> and provide commented, minimal, self-contained, reproducible
> > >>> code.
> > >>>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>> ----------------------------------------------------------
> > >>>>>>>>>
> > >>>>>>>>> Vivek Das, PhD
> > >>>>>>>>>
> > >>>>>>>> [[alternative HTML version deleted]]
> > >>>>>>>>
> > >>>>>>>> ______________________________________________
> > >>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
> see
> > >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>>>>>>> PLEASE do read the posting guide
> > >>>>>>> http://www.R-project.org/posting-guide.html
> > >>>>>>>> and provide commented, minimal, self-contained, reproducible
> code.
> > >>>>>>>
> > >>>>>>> [[alternative HTML version deleted]]
> > >>>>>>>
> > >>>>>>> ______________________________________________
> > >>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
> see
> > >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>>>>>> PLEASE do read the posting guide
> > >>>>>>> http://www.R-project.org/posting-guide.html
> > >>>>>>> and provide commented, minimal, self-contained, reproducible
> code.
> > >>>>>>>
> > >>>>
> > >>>>
> > >>>
> > >>> [[alternative HTML version deleted]]
> > >>>
> > >>> ______________________________________________
> > >>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>> PLEASE do read the posting guide
> > >>> http://www.R-project.org/posting-guide.html
> > >>> and provide commented, minimal, self-contained, reproducible code.
> > >>>
> > >>
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list