[Rd] pbinom( ) function (PR#8700)
Duncan Murdoch
murdoch at stats.uwo.ca
Wed Mar 22 17:38:52 CET 2006
On 3/22/2006 10:08 AM, Peter Dalgaard wrote:
> Duncan Murdoch <murdoch at stats.uwo.ca> writes:
>
>> On 3/22/2006 3:52 AM, maechler at stat.math.ethz.ch wrote:
>> >>>>>> "cspark" == cspark <cspark at clemson.edu>
>> >>>>>> on Wed, 22 Mar 2006 05:52:13 +0100 (CET) writes:
>> >
>> > cspark> Full_Name: Chanseok Park Version: R 2.2.1 OS: RedHat
>> > cspark> EL4 Submission from: (NULL) (130.127.112.89)
>> >
>> >
>> >
>> > cspark> pbinom(any negative value, size, prob) should be
>> > cspark> zero. But I got the following results. I mean, if
>> > cspark> a negative value is close to zero, then pbinom()
>> > cspark> calculate pbinom(0, size, prob).
>> >
>> > >> pbinom( -2.220446e-22, 3,.1)
>> > [1] 0.729
>> > >> pbinom( -2.220446e-8, 3,.1)
>> > [1] 0.729
>> > >> pbinom( -2.220446e-7, 3,.1)
>> > [1] 0
>> >
>> > Yes, all the [dp]* functions which are discrete with mass on the
>> > integers only, do *round* their 'x' to integers.
>> >
>> > I could well argue that the current behavior is *not* a bug,
>> > since we do treat "x close to integer" as integer, and hence
>> > pbinom(eps, size, prob) with eps "very close to 0" should give
>> > pbinom(0, size, prob)
>> > as it now does.
>> >
>> > However, for esthetical reasons,
>> > I agree that we should test for "< 0" first (and give 0 then) and only
>> > round otherwise. I'll change this for R-devel (i.e. R 2.3.0 in
>> > about a month).
>> >
>> > cspark> dbinom() also behaves similarly.
>> >
>> > yes, similarly, but differently.
>> > I have changed it (for R-devel) as well, to behave the same as
>> > others d*() , e.g., dpois(), dnbinom() do.
>>
>> Martin, your description makes it sound as though dbinom(0.3, size,
>> prob) would give the same answer as dbinom(0, size, prob), whereas it
>> actually gives 0 with a warning, as documented in ?dbinom. The d*
>> functions only round near-integers to integers, where it looks as though
>> near means within 1E-7. The p* functions round near integers to
>> integers, and truncate others to the integer below.
>
> Well, the p-functions are constant on the intervals between
> integers...
Not quite: they're constant on intervals (n - 1e-7, n+1 - 1e-7), for
integers n. Since Martin's change, this is not true for n=0.
(Or, did you refer to the lack of a warning? One point
> could be that cumulative p.d.f.s extends naturally to non-integers,
> whereas densities don't really extend, since they are defined with
> respect to counting measure on the integers.)
I wasn't complaining about the behaviour here, I was just clarifying
Martin's description of it, when he said that "all the [dp]* functions
which are discrete with mass on the integers only, do *round* their 'x'
to integers".
>
>> I suppose the reason for this behaviour is to protect against rounding
>> error giving nonsense results; I'm not sure that's a great idea, but if
>> we do it, should we really be handling 0 differently?
>
> Most of these round-near-integer issues were spurred by real
> programming problems. It is somewhat hard to come up with a problem
> that leads you generate a binomial variate value with "floating point
> noise", but I'm quite sure that we'll be reminded if we try to change
> it... (One potential issue is back-calculation to counts from relative
> frequencies).
Again, I wasn't suggesting we change the general +/- 1E-7 behaviour
(though it should be documented to avoid bug reports like this one), but
I'm worried about having zero as a special case. This will break
relations such as
dbinom(x, n, 0.5) == dbinom(n-x, n, 0.5)
(in the case where x is n+epsilon or -epsilon, for small enough
epsilon). Is it really desirable to break the symmetry like this?
Duncan Murdoch
More information about the R-devel
mailing list