[Rd] NA warnings for r<distr>() {aka "patch for random.c"}

Tue Mar 11 06:19:46 CET 2008

G'day Martin and others,

On Mon, 10 Mar 2008 12:06:01 +0100
Martin Maechler <maechler at stat.math.ethz.ch> wrote:

> >>>>> "BAT" == Berwin A Turlach <statba at nus.edu.sg>
> >>>>>     on Fri, 7 Mar 2008 23:54:06 +0800 writes:
>
>     BAT> After all,
> 
>     >> 1:2 + Inf
>     BAT> [1] Inf Inf
> 
>     BAT> does not create any warning either.  
> 
> but it doesn't give NA either.
> A bit more relevant (and I'm sure you meant implicitly)

You give me more credit than I deserve. :)
I was guided by what rexp() was doing when I chose that example, i.e.
(potentially) warning about NAs being created when actually +/- Infs
were created. In this thread I was already once or twice tempted to
comment on the appropriateness of the warning message but for various
reasonx always refrained from doing so.  

> BTW, I've also found that I  
>      rnorm(n, mu=Inf) |--> NA  was not ok, and changed that to
>      rnorm(n, mu=Inf) |--> Inf
> 
> More feedback is welcome,
> but please now "start" with R-devel rev >= 44715

While I agree with this change, I think it opens a whole can of worms,
since it invites a rethink about all distributions.  At the moment
we have:

<snip>
R version 2.7.0 Under development (unstable) (2008-03-10 r44730)
<snip>
> set.seed(1) ; exp(rnorm(2))
[1] 0.5344838 1.2015872
> set.seed(1) ; rlnorm(2)
[1] 0.5344838 1.2015872
> set.seed(1) ; exp(rnorm(2, mean=c(-Inf,Inf)))
[1]   0 Inf
> set.seed(1) ; rlnorm(2, mean=c(-Inf,Inf))
[1] NaN NaN
Warning message:
In rlnorm(2, mean = c(-Inf, Inf)) : NAs produced

The first two lines give identical results, as one could reasonably
expect.  But the other two do not and I would argue that a user could
reasonably expect that the commands in these two lines should lead to
identical results.

Likewise, coming back to the one-parameter distribution used in this
thread for illustration purposes, the *exp() functions in R are
parameterised by the rate and an exponential random variable with rate
r has mean (1/r) and variance (1/r^2).  Thus, I would argue that
rexp(2, Inf) should return 0's and not NaN's, since in the limit
a rate of Inf seems to refer to a variable with mean 0 and variance 0,
i.e. the constant 0.  And it would also be "more consistent" with the
behaviour of rexp() when the rate parameter is "large but finite".

Then one can argue about rgeom() when p=0.  But I guess in that case
the limit is a "random" variable with "infinite mean" and "infinite
variance" and hence it is o.k. to return NaNs and not Infs.  After all,
your comments in rnorm.c seem to indicate that you think that reporting
+/- Inf back is only o.k. if the mean is +/- Inf but the variance is
finite.

I did not look at (or think about) extreme cases for any other
distributions, but further discussion/feedback would indeed be helpful
it seems. :)

And the attached patch would address the behaviour of rlnorm() and
rexp() that I have raised above.  With these changes, on my machine,
"make check FORCE=FORCE" succeeds.

BTW, I was surprised to realise that the *exp() functions in the
underlying C code use the opposite parameterisation from the
corresponding functions at R level.  Perhaps it would be worthwhile to
point this out in section 6.7.1 of the Writing R extension manual?  In
particular since the manual states:

  Note that these argument sequences are (apart from the names and that
  @code{rnorm} has no @var{n}) exactly the same as the corresponding
  @R{} functions of the same name, so the documentation of the @R{}
  functions can be used.

Well, as I noticed the hard way, for *exp() the documentation of the
corresponding R functions cannot be used. ;-)

Thanks for you patience.

Cheers,

	Berwin

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: R-patch
Url: https://stat.ethz.ch/pipermail/r-devel/attachments/20080311/3aa1f5d2/attachment.pl