[Rd] NA warnings for r<distr>() {aka "patch for random.c"}

Martin Maechler maechler at stat.math.ethz.ch
Mon Mar 10 12:06:01 CET 2008


>>>>> "BAT" == Berwin A Turlach <statba at nus.edu.sg>
>>>>>     on Fri, 7 Mar 2008 23:54:06 +0800 writes:

    BAT> G'day Martin (and "listeners"), On Fri, 7 Mar 2008
    BAT> 15:01:26 +0100 Martin Maechler
    BAT> <maechler at stat.math.ethz.ch> wrote:

    BAT> [...]
    >> >> If you feel like finding another elegant patch...
    >> 
    BAT> Well, elegance is in the eye of the beholder. :-)
    >> 
    BAT> I attach two patches.  One that adds warning messages
    BAT> at the other places where NAs can be generated.
    >> 
    >> ok. The result is very simple ``hence elegant''.
    >> 
    >> But actually, part of the changed behavior may be
    >> considered undesirable:
    >> 
    >> rnorm(2, mean = NA)
    >> 
    >> which gives two NaN's would now produce a warning, where
    >> I could argue that 'arithmetic with NAs should give NAs
    >> without a warning' since 1:2 + NA also gives NAs without
    >> a warning.
    >> 
    >> So we could argue that a warning should *only* be
    >> produced in a case where the parameters of the
    >> distribution are not NA.
    >> 
    >> What do others (particularly R-core !) think?

    BAT> I can agree with that point of view.  But then, should
    BAT> a warning not be issued only if one of the parameters
    BAT> of the distribution is NA, or should it also not be
    BAT> issued if any non-finite parameter is encountered?
    BAT> After all,

    >> 1:2 + Inf
    BAT> [1] Inf Inf

    BAT> does not create any warning either.  

but it doesn't give NA either.
A bit more relevant (and I'm sure you meant implicitly)

  > Inf/Inf
  [1] NaN

However, I think we shouldn't go as far.

    BAT> In that case, a patch as the attached should do, it
    BAT> checks all parameters for finiteness and then checks
    BAT> whether the generated number is not finite.

    BAT> Thus, with the patch applied I see the following
    BAT> behaviour:

    >> rnorm(2, mean=NA)
    BAT> [1] NaN NaN
    >> rnorm(5, mean=c(0,Inf, -Inf, NA, NaN))
    BAT> [1] 1.897874 NaN NaN NaN NaN
    >> rbinom(2, size=20, p=1.2)
    BAT> [1] NaN NaN Warning message: In rbinom(2, size = 20, p
    BAT> = 1.2) : NAs produced
    >> rexp(2, rate=-2)
    BAT> [1] NaN NaN Warning message: In rexp(2, rate = -2) :
    BAT> NAs produced

I've committed a bit a different patch for now (svn rev 44715),
because I felt we were getting too sophisticated.

The current (since about 20 minutes) situation is to 
warn when the *result* is NA or NaN and not to warn in any other
cases.
This maybe not be perfect, but at least is very consistent and
in particular consistent to what the warning itself "says".

BTW, I've also found that I  
     rnorm(n, mu=Inf) |--> NA  was not ok, and changed that to
     rnorm(n, mu=Inf) |--> Inf

More feedback is welcome,
but please now "start" with R-devel rev >= 44715

Martin



    BAT> Without the patch:

    >> rnorm(2, mean=NA)
    BAT> [1] NaN NaN
    >> rnorm(5, mean=c(0,Inf, -Inf, NA, NaN))
    BAT> [1] -0.1719657 NaN NaN NaN NaN
    >> rbinom(2, size=20, p=1.2)
    BAT> [1] NaN NaN
    >> rexp(2, rate=-2)
    BAT> [1] NaN NaN Warning message: In rexp(2, rate = -2) :
    BAT> NAs produced

    BAT> On my machine, "make check FORCE=FORCE" passes with
    BAT> this patch.



    BAT> [...]
    >> For now, I will ignore this second patch.
    >> 
    >> - it does bloat the code slightly (as you conceded)

    BAT> Fair enough. :) I also proved my point that more
    BAT> complicated code is harder to maintain.  In the two
    BAT> parameter case I was testing twice na for being one
    BAT> instead of testing na and nb.......

    BAT> [...]
    >> but most importantly:
    >> 
    >> - we have no idea if the speedup (when <Simple> is TRUE)
    >> is in the order of 10%, 1% or 0.1%
    >> 
    >> My guess would be '0.1%' for rnorm(), and that would
    >> definitely not warrant the extra check.

    BAT> I would not bet against this.  Probably even with all
    BAT> the additional checks for finiteness of parameters
    BAT> there would be not much speed difference.  The question
    BAT> might be whether you want to replace the (new)
    BAT> R_FINITE()'s rather by ISNA()'s (or something else).
    BAT> One could also discuss in which order the checks should
    BAT> be made (first generated number then parameters of
    BAT> distribution or vice versa).  But I will leave this to
    BAT> R-core to decide. :)

    >> >> Thank you Berwin, for your contribution!
    >> 
    >> and thanks again!

    BAT> Still my pleasure. :)

    BAT> Cheers,

    BAT> 	Berwin Index: src/main/random.c
    BAT> ===================================================================
    BAT> --- src/main/random.c (revision 44693) +++
    BAT> src/main/random.c (working copy) @@ -44,7 +44,7 @@ for
    BAT> (i = 0; i < n; i++) { ai = a[i % na]; x[i] = f(ai); -
    BAT> if (!R_FINITE(x[i])) naflag = 1; + if (!R_FINITE(x[i])
    BAT> && R_FINITE(ai)) naflag = 1; } return(naflag); } @@
    BAT> -81,6 +81,7 @@ if (na < 1) { for (i = 0; i < n; i++)
    BAT> REAL(x)[i] = NA_REAL; + warning(_("NAs produced")); }
    BAT> else { PROTECT(a = coerceVector(CADR(args), REALSXP));
    BAT> @@ -116,14 +117,14 @@ ai = a[i % na]; bi = b[i % nb];
    BAT> x[i] = f(ai, bi); - if (!R_FINITE(x[i])) naflag = 1; +
    BAT> if (!R_FINITE(x[i]) && R_FINITE(ai) && R_FINITE(bi))
    BAT> naflag = 1; } return(naflag); }
 
    BAT>  #define RAND2(num,name) \ case num: \ - random2(name,
    BAT> REAL(a), na, REAL(b), nb, REAL(x), n); \ + naflag =
    BAT> random2(name, REAL(a), na, REAL(b), nb, REAL(x), n); \
    BAT> break
 
    BAT>  /* "do_random2" - random sampling from 2 parameter
    BAT> families. */ @@ -155,6 +156,7 @@ if (na < 1 || nb < 1)
    BAT> { for (i = 0; i < n; i++) REAL(x)[i] = NA_REAL; +
    BAT> warning(_("NAs produced")); } else { PROTECT(a =
    BAT> coerceVector(CADR(args), REALSXP)); @@ -200,14 +202,14
    BAT> @@ bi = b[i % nb]; ci = c[i % nc]; x[i] = f(ai, bi,
    BAT> ci); - if (!R_FINITE(x[i])) naflag = TRUE; + if
    BAT> (!R_FINITE(x[i]) && R_FINITE(ai) && R_FINITE(bi) &&
    BAT> R_FINITE(ci)) naflag = TRUE; } return(naflag); }
 
    BAT>  #define RAND3(num,name) \ case num: \ - random3(name,
    BAT> REAL(a), na, REAL(b), nb, REAL(c), nc, REAL(x), n); \ +
    BAT> naflag = random3(name, REAL(a), na, REAL(b), nb,
    BAT> REAL(c), nc, REAL(x), n); \ break
 
 
    BAT> @@ -244,6 +246,7 @@ if (na < 1 || nb < 1 || nc < 1) {
    BAT> for (i = 0; i < n; i++) REAL(x)[i] = NA_REAL; +
    BAT> warning(_("NAs produced")); } else { PROTECT(a =
    BAT> coerceVector(a, REALSXP));
    BAT> ______________________________________________
    BAT> R-devel at r-project.org mailing list
    BAT> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list