[R] about a p-value < 2.2e-16
Bogdan Tanasa
t@n@@@ @end|ng |rom gm@||@com
Fri Mar 19 19:39:08 CET 2021
Thank you Kevin, their wording is "Please note that the exact p value
should be provided, when possible, etc"
by "exact p-value" i believe that they do mean indeed the actual number,
and not to specify "exact=TRUE" ;
as we are working with 1000 genes, shall i specify "exact=TRUE" on my PC,
it runs out of memory ...
wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value
On Fri, Mar 19, 2021 at 11:10 AM Kevin Thorpe <kevin.thorpe using utoronto.ca>
wrote:
> I have to ask since. Are you sure the journal simply means by exact
> p-value that they don’t want to see a p-value given as < 0.0001, for
> example, and simply want the actual number?
>
> I cannot imagine they really meant exact as in the p-value from some exact
> distribution.
>
> --
> Kevin E. Thorpe
> Head of Biostatistics, Applied Health Research Centre (AHRC)
> Li Ka Shing Knowledge Institute of St. Michael's
> Assistant Professor, Dalla Lana School of Public Health
> University of Toronto
> email: kevin.thorpe using utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016
>
> > On Mar 19, 2021, at 1:22 PM, Bogdan Tanasa <tanasa using gmail.com> wrote:
> >
> > EXTERNAL EMAIL:
> >
> > Dear all, thank you all for comments and help.
> >
> > as far as i can see, shall we have samples of 1000 records, only
> > "exact=FALSE" allows the code to run:
> >
> > wilcox.test(rnorm(1000), rnorm(1000, 2), exact=FALSE)$p.value
> > [1] 7.304863e-231
> >
> > shall i use "exact=TRUE", it runs out of memory on my 64GB RAM PC :
> >
> > wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value
> > (the job is terminated by OS)
> >
> > shall you have any other suggestions, please let me know. thanks a lot !
> >
> > On Fri, Mar 19, 2021 at 9:05 AM Bert Gunter <bgunter.4567 using gmail.com>
> wrote:
> >
> >> I **believe** -- if my old memory still serves-- that the "exact"
> >> specification uses a home grown version of the algorithm to calculate
> >> exact, or close approximations to the exact, permutation distribution
> >> originally developed by Cyrus Mehta, founder of StatXact software. Of
> >> course, examining the C code source would determine this, but I don't
> care
> >> to attempt this.
> >>
> >> If this is (no longer?) correct, please point this out.
> >>
> >> Best,
> >>
> >> Bert Gunter
> >>
> >> "The trouble with having an open mind is that people keep coming along
> and
> >> sticking things into it."
> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>
> >>
> >> On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang <szwjf08 using gmail.com> wrote:
> >>
> >>> Hi Spencer,
> >>>
> >>> Thanks for your test results, I do not know the answer as I haven't
> >>> used wilcox.test for many years. I do not know if it is possible to
> >>> compute
> >>> the exact distribution of the Wilcoxon rank sum statistic, but I think
> it
> >>> is very likely, as the document of `Wilcoxon` says:
> >>>
> >>> This distribution is obtained as follows. Let x and y be two random,
> >>> independent samples of size m and n. Then the Wilcoxon rank sum
> statistic
> >>> is the number of all pairs (x[i], y[j]) for which y[j] is not greater
> than
> >>> x[i]. This statistic takes values between 0 and m * n, and its mean and
> >>> variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively.
> >>>
> >>> As a nice feature of the non-parametric statistic, it is usually
> >>> distribution-free so you can pick any distribution you like to compute
> the
> >>> same statistic. I wonder if this is the case, but I might be wrong.
> >>>
> >>> Cheers,
> >>> Jiefei
> >>>
> >>>
> >>> On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves <
> >>> spencer.graves using effectivedefense.org> wrote:
> >>>
> >>>>
> >>>>
> >>>> On 2021-3-19 9:52 AM, Jiefei Wang wrote:
> >>>>> After digging into the R source, it turns out that the argument
> >>> `exact`
> >>>> has
> >>>>> nothing to do with the numeric precision. It only affects the
> >>> statistic
> >>>>> model used to compute the p-value. When `exact=TRUE` the true
> >>>> distribution
> >>>>> of the statistic will be used. Otherwise, a normal approximation will
> >>> be
> >>>>> used.
> >>>>>
> >>>>> I think the documentation needs to be improved here, you can compute
> >>> the
> >>>>> exact p-value *only* when you do not have any ties in your data. If
> >>> you
> >>>>> have ties in your data you will get the p-value from the normal
> >>>>> approximation no matter what value you put in `exact`. This behavior
> >>>> should
> >>>>> be documented or a warning should be given when `exact=TRUE` and ties
> >>>>> present.
> >>>>>
> >>>>> FYI, if the exact p-value is required, `pwilcox` function will be
> >>> used to
> >>>>> compute the p-value. There are no details on how it computes the
> >>> pvalue
> >>>> but
> >>>>> its C code seems to compute the probability table, so I assume it
> >>>> computes
> >>>>> the exact p-value from the true distribution of the statistic, not a
> >>>>> permutation or MC p-value.
> >>>>
> >>>>
> >>>> My example shows that it does NOT use Monte Carlo, because
> >>>> otherwise it uses some distribution. I believe the term "exact" means
> >>>> that it uses the permutation distribution, though I could be mistaken.
> >>>> If it's NOT a permutation distribution, I don't know what it is.
> >>>>
> >>>>
> >>>> Spencer
> >>>>>
> >>>>> Best,
> >>>>> Jiefei
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwjf08 using gmail.com>
> >>> wrote:
> >>>>>
> >>>>>> Hey,
> >>>>>>
> >>>>>> I just want to point out that the word "exact" has two meanings. It
> >>> can
> >>>>>> mean the numerically accurate p-value as Bogdan asked in his first
> >>>> email,
> >>>>>> or it could mean the p-value calculated from the exact distribution
> >>> of
> >>>> the
> >>>>>> statistic(In this case, U stat). These two are actually not related,
> >>>> even
> >>>>>> though they all called "exact".
> >>>>>>
> >>>>>> Best,
> >>>>>> Jiefei
> >>>>>>
> >>>>>> On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves <
> >>>>>> spencer.graves using effectivedefense.org> wrote:
> >>>>>>
> >>>>>>>
> >>>>>>> On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
> >>>>>>>> thanks a lot, Vivek ! in other words, assuming that we work with
> >>> 1000
> >>>>>>> data
> >>>>>>>> points,
> >>>>>>>>
> >>>>>>>> shall we use EXACT = TRUE, it uses the normal approximation,
> >>>>>>>>
> >>>>>>>> while if EXACT=FALSE (for these large samples), it does not ?
> >>>>>>>
> >>>>>>> As David Winsemius noted, the documentation is not clear.
> >>>>>>> Consider the following:
> >>>>>>>
> >>>>>>>> set.seed(1) > x <- rnorm(100) > y <- rnorm(100, 2) > >
> >>> wilcox.test(x,
> >>>>>>> y)$p.value
> >>>>>>> [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > >
> >>>>>>> wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 >
> >>> wilcox.test(x,
> >>>>>>> y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> >>>>>>> exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y,
> >>>>>>> exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y,
> >>>>>>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> >>>>>>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> >>>>>>> exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> >>>>>>> exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here:
> >>>>>>> 1.172189e-25 and 4.123875e-32. The first one, I think, is the
> normal
> >>>>>>> approximation, which is the same as exact=FALSE. I think that with
> >>>>>>> exact=FALSE, you get a permutation distribution, though I'm not
> >>> sure.
> >>>>>>> You might try looking at "wilcox_test in package coin for exact,
> >>>>>>> asymptotic and Monte Carlo conditional p-values, including in the
> >>>>>>> presence of ties" to see if it is clearer. NOTE: R is case
> >>> sensitive,
> >>>> so
> >>>>>>> "EXACT" is a different variable from "exact". It is interpreted as
> >>> an
> >>>>>>> optional argument, which is not recognized and therefore ignored in
> >>>> this
> >>>>>>> context.
> >>>>>>> Hope this helps.
> >>>>>>> Spencer
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mmind using gmail.com>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Bogdan,
> >>>>>>>>>
> >>>>>>>>> You can also get the information from the link of the Wilcox.test
> >>>>>>> function
> >>>>>>>>> page.
> >>>>>>>>>
> >>>>>>>>> “By default (if exact is not specified), an exact p-value is
> >>> computed
> >>>>>>> if
> >>>>>>>>> the samples contain less than 50 finite values and there are no
> >>> ties.
> >>>>>>>>> Otherwise, a normal approximation is used.”
> >>>>>>>>>
> >>>>>>>>> For more:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>
> >>>
> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html
> >>>>>>>>> Hope this helps!
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>>
> >>>>>>>>> VD
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tanasa using gmail.com
> >
> >>>>>>> wrote:
> >>>>>>>>>> Dear Peter, thanks a lot. yes, we can see a very precise
> p-value,
> >>>> and
> >>>>>>> that
> >>>>>>>>>> was the request from the journal.
> >>>>>>>>>>
> >>>>>>>>>> if I may ask another question please : what is the meaning of
> >>>>>>> "exact=TRUE"
> >>>>>>>>>> or "exact=FALSE" in wilcox.test ?
> >>>>>>>>>>
> >>>>>>>>>> i can see that the "numerically precise" p-values are different.
> >>>>>>> thanks a
> >>>>>>>>>> lot !
> >>>>>>>>>>
> >>>>>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
> >>>>>>>>>> tst$p.value
> >>>>>>>>>> [1] 8.535524e-25
> >>>>>>>>>>
> >>>>>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE)
> >>>>>>>>>> tst$p.value
> >>>>>>>>>> [1] 3.448211e-25
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder <
> >>>>>>>>>> peter.langfelder using gmail.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I thinnk the answer is much simpler. The print method for
> >>>> hypothesis
> >>>>>>>>>>> tests (class htest) truncates the p-values. In the above
> >>> example,
> >>>>>>>>>>> instead of using
> >>>>>>>>>>>
> >>>>>>>>>>> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
> >>>>>>>>>>>
> >>>>>>>>>>> and copying the output, just print the p-value:
> >>>>>>>>>>>
> >>>>>>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
> >>>>>>>>>>> tst$p.value
> >>>>>>>>>>>
> >>>>>>>>>>> [1] 2.988368e-32
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> I think this value is what the journal asks for.
> >>>>>>>>>>>
> >>>>>>>>>>> HTH,
> >>>>>>>>>>>
> >>>>>>>>>>> Peter
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves
> >>>>>>>>>>> <spencer.graves using effectivedefense.org> wrote:
> >>>>>>>>>>>> I would push back on that from two perspectives:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1. I would study exactly what the journal said
> >>>> very
> >>>>>>>>>>>> carefully. If they mandated "wilcox.test", that function has
> >>> an
> >>>>>>>>>>>> argument called "exact". If that's what they are asking, then
> >>>> using
> >>>>>>>>>>>> that argument gives the exact p-value, e.g.:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Wilcoxon rank sum exact test
> >>>>>>>>>>>>
> >>>>>>>>>>>> data: rnorm(100) and rnorm(100, 2)
> >>>>>>>>>>>> W = 691, p-value < 2.2e-16
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2. If that's NOT what they are asking, then I'm
> >>>> not
> >>>>>>>>>>>> convinced what they are asking makes sense: There is is no
> >>> such
> >>>>>>> thing
> >>>>>>>>>>>> as an "exact p value" except to the extent that certain
> >>>> assumptions
> >>>>>>>>>>>> hold, and all models are wrong (but some are useful), as
> George
> >>>> Box
> >>>>>>>>>>>> famously said years ago.[1] Truth only exists in mathematics,
> >>> and
> >>>>>>>>>>>> that's because it's a fiction to start with ;-)
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hope this helps.
> >>>>>>>>>>>> Spencer Graves
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1]
> >>>>>>>>>>>> https://en.wikipedia.org/wiki/All_models_are_wrong
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
> >>>>>>>>>>>>> <
> >>>>>>>
> >>>>
> https://meta.stackexchange.com/questions/362285/about-a-p-value-2-2e-16
> >>>>>>>>>>>>> Dear all,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> i would appreciate having your advice on the following please
> >>> :
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> in R, the wilcox.test() provides "a p-value < 2.2e-16", when
> >>> we
> >>>>>>>>>> compare
> >>>>>>>>>>>>> sets of 1000 genes expression (in the genomics field).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> however, the journal asks us to provide the exact p value ...
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> would it be legitimate to write : "p-value = 0" ? thanks a
> >>> lot,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -- bogdan
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [[alternative HTML version deleted]]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ______________________________________________
> >>>>>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and
> more,
> >>>> see
> >>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>>>>>>> PLEASE do read the posting guide
> >>>>>>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
> >>>> code.
> >>>>>>>>>>>> ______________________________________________
> >>>>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
> >>> see
> >>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>>>>>> PLEASE do read the posting guide
> >>>>>>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
> >>> code.
> >>>>>>>>>> [[alternative HTML version deleted]]
> >>>>>>>>>>
> >>>>>>>>>> ______________________________________________
> >>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
> >>> see
> >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>>>> PLEASE do read the posting guide
> >>>>>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>>>>> and provide commented, minimal, self-contained, reproducible
> >>> code.
> >>>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> ----------------------------------------------------------
> >>>>>>>>>
> >>>>>>>>> Vivek Das, PhD
> >>>>>>>>>
> >>>>>>>> [[alternative HTML version deleted]]
> >>>>>>>>
> >>>>>>>> ______________________________________________
> >>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>> PLEASE do read the posting guide
> >>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>>>>
> >>>>>>> [[alternative HTML version deleted]]
> >>>>>>>
> >>>>>>> ______________________________________________
> >>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>> PLEASE do read the posting guide
> >>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>>>>
> >>>>
> >>>>
> >>>
> >>> [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list