[Rd] NA warnings for r<distr>() {aka "patch for random.c"}
Martin Maechler
maechler at stat.math.ethz.ch
Tue Mar 11 18:07:35 CET 2008
>>>>> "BAT" == Berwin A Turlach <statba at nus.edu.sg>
>>>>> on Tue, 11 Mar 2008 13:19:46 +0800 writes:
BAT> G'day Martin and others,
BAT> On Mon, 10 Mar 2008 12:06:01 +0100
BAT> Martin Maechler <maechler at stat.math.ethz.ch> wrote:
>> >>>>> "BAT" == Berwin A Turlach <statba at nus.edu.sg>
>> >>>>> on Fri, 7 Mar 2008 23:54:06 +0800 writes:
>>
BAT> After all,
>>
>> >> 1:2 + Inf
BAT> [1] Inf Inf
>>
BAT> does not create any warning either.
>>
>> but it doesn't give NA either.
>> A bit more relevant (and I'm sure you meant implicitly)
BAT> You give me more credit than I deserve. :)
BAT> I was guided by what rexp() was doing when I chose that example, i.e.
BAT> (potentially) warning about NAs being created when actually +/- Infs
BAT> were created. In this thread I was already once or twice tempted to
BAT> comment on the appropriateness of the warning message but for various
BAT> reasonx always refrained from doing so.
>> BTW, I've also found that I
>> rnorm(n, mu=Inf) |--> NA was not ok, and changed that to
>> rnorm(n, mu=Inf) |--> Inf
>>
>> More feedback is welcome,
>> but please now "start" with R-devel rev >= 44715
BAT> While I agree with this change, I think it opens a whole can of worms,
not a big can, though ....
BAT> since it invites a rethink about all distributions. At the moment
BAT> we have:
BAT> <snip>
BAT> R version 2.7.0 Under development (unstable) (2008-03-10 r44730)
BAT> <snip>
>> set.seed(1) ; exp(rnorm(2))
BAT> [1] 0.5344838 1.2015872
>> set.seed(1) ; rlnorm(2)
BAT> [1] 0.5344838 1.2015872
>> set.seed(1) ; exp(rnorm(2, mean=c(-Inf,Inf)))
BAT> [1] 0 Inf
>> set.seed(1) ; rlnorm(2, mean=c(-Inf,Inf))
BAT> [1] NaN NaN
BAT> Warning message:
BAT> In rlnorm(2, mean = c(-Inf, Inf)) : NAs produced
BAT> The first two lines give identical results, as one could reasonably
BAT> expect.
Yes, but I don't think a user should *rely* on the way R
generates these random number; though I agree for the very
specific case of rlnorm(..).
BAT> But the other two do not and I would argue that a user could
BAT> reasonably expect that the commands in these two lines should lead to
BAT> identical results.
They now do.
BAT> Likewise, coming back to the one-parameter distribution used in this
BAT> thread for illustration purposes, the *exp() functions in R are
BAT> parameterised by the rate and an exponential random variable with rate
BAT> r has mean (1/r) and variance (1/r^2). Thus, I would argue that
BAT> rexp(2, Inf) should return 0's and not NaN's, since in the limit
BAT> a rate of Inf seems to refer to a variable with mean 0 and variance 0,
BAT> i.e. the constant 0. And it would also be "more consistent" with the
BAT> behaviour of rexp() when the rate parameter is "large but finite".
yes. And that's *the* (only) case where 'Inf' parameters or 'scale = 0'
situations should work: when it is a natural limit behavior.
BAT> Then one can argue about rgeom() when p=0. But I guess in that case
BAT> the limit is a "random" variable with "infinite mean" and "infinite
BAT> variance" and hence it is o.k. to return NaNs and not Infs. After all,
BAT> your comments in rnorm.c seem to indicate that you think that reporting
BAT> +/- Inf back is only o.k. if the mean is +/- Inf but the variance is
BAT> finite.
BAT> I did not look at (or think about) extreme cases for any other
BAT> distributions, but further discussion/feedback would indeed be helpful
BAT> it seems. :)
I looked at a few more, basically all continuous
src/nmath/r<foo>.c ones.
BAT> And the attached patch would address the behaviour of rlnorm() and
BAT> rexp() that I have raised above. With these changes, on my machine,
BAT> "make check FORCE=FORCE" succeeds.
I don't think your change to .../R/distn.R was good,
but the others I have more or less committed together with a few
more similar ones.
BAT> BTW, I was surprised to realise that the *exp() functions in the
BAT> underlying C code use the opposite parameterisation from the
BAT> corresponding functions at R level. Perhaps it would be worthwhile to
BAT> point this out in section 6.7.1 of the Writing R extension manual? In
BAT> particular since the manual states:
BAT> Note that these argument sequences are (apart from the names and that
BAT> @code{rnorm} has no @var{n}) exactly the same as the corresponding
BAT> @R{} functions of the same name, so the documentation of the @R{}
BAT> functions can be used.
BAT> Well, as I noticed the hard way, for *exp() the documentation of the
BAT> corresponding R functions cannot be used. ;-)
We often also gratefully accept patches for the documentation
...
BAT> Thanks for you patience.
;-)
Regards,
Martin
More information about the R-devel
mailing list