[Rd] inconsistency in pchisq (PR#7099)
maechler at stat.math.ethz.ch
maechler at stat.math.ethz.ch
Sat Jul 17 13:21:45 CEST 2004
>>>>> "Richard" == Richard Mott <Richard.Mott at well.ox.ac.uk>
>>>>> on Sat, 17 Jul 2004 11:12:23 +0100 (BST) writes:
Richard> Martin - I agree that the p-values are essentially
Richard> identical. i raised the issue becuase i can get
Richard> negative p-values with the ncp=0 version. It is
Richard> hard to reproduce this problem in the sense that
Richard> the value of the chi-squared statistic that causes
Richard> the phenomenom is identical to the one i sent you
Richard> to 5 sig figures - i don't know how to print the
Richard> full value in order to send it to you. Negative
Richard> small probabilities could of course be treated as
Richard> 0, but this creates problems when i take logs
sure, and much more of a problem, i.e., a clear bug.
--> so after all your bug report was well valid [-> CC'ed back to R-bugs]
To see it more extremely, try the following :
> curve(pchisq(x, df=1 , lower=FALSE), 65, 70, ylim=c(-1,4)*4e-16, col=2)
> curve(pchisq(x, df=1, ncp=0, lower=FALSE), 65, 70, add=TRUE)
The reason for this behavior is simply that internal pnchisq(*, lower=FALSE)
at the moment simply is equivalent to 1 - pnchisq(*, lower=TRUE),
and since the computer epsilon is 2e-16 it's no wonder that
cancellation swamps everything for these extreme abscissa values.
Note that we could easily change the case 'ncp=0' to use the
central chisq, but that's not the case for e.g. ncp = 0.001.
==> pchisq(*, ncp) definitely needs to be improved.
I have code to do it (using "Wiener germ approximations"), but not
finished testing it yet.
Richard> If you like i can send you the data and program
Richard> used to generate the problem - the chi-squared
Richard> value is generated from a call to glm()
Richard> Richard
>>>>>>> "Richard" == Richard Mott <rmott at well.ox.ac.uk> on
>>>>>>> Sat, 17 Jul 2004 01:42:18 +0100 writes:
>>
Richard> Martin - thanks - so is it always better to use
Richard> pchisq(67.60644,df=1,lower.tail=F) if indeed ncp=0
Richard> ?
>> Yes, (for the algorithms currently in use; and as I
>> said, these are destined to be improved).
>>
>> However I do wonder: In which situations would it matter
>> to have P = 2e-16 vs P = 3e-15 ??
>>
>> Very often everything depends on underlying model
>> assumptions, and such extreme tail probabilities are
>> typically extremely dependent, i.e. would vary heavily by
>> small changes in the underlying model.
>>
>> Regards, Martin
More information about the R-devel
mailing list