[Rd] 0.5 != integrate(dnorm,0,20000) = 0

John Nolan jpnolan at american.edu
Tue Dec 7 14:38:47 CET 2010


I have wrestled with this problem before.  I think correcting
the warning to "absolute error ~<= 0" is a good idea, and printing 
a warning if subdivisions==1 is also helpful.  Also, including
a simple example like the one that started this thread on the
help page for integrate might make the issue more clear to users.

But min.subdivisions is probably not.  On the example with dnorm( ),
I doubt 3 subdivisions would work.  The problem isn't that
we aren't sudividing enough, the problem is that the integrand
is 0 (in double precision) on most of the region and the
algorithm isn't designed to handle this.  There is no way to
determine how many subdivisions are necessary to get a reasonable
answer without a detailed analysis of the integrand.

I've gotten useful results with integrands that are monotonic on
the tail with a "self-triming integration" routine
like the following:

>right.trimmed.integrate <- function( f, lower, upper, epsilon=1e-100, min.width=1e-10, ... ) {
+ # trim the region of integration on the right until f(x) > epsilon
+ 
+ a <- lower; b <- upper
+ while ( (b-a>min.width) && (f(b)<epsilon) ) { b <- (a+b)/2 }
+ 
+ return( integrate(f,a,b,...) ) }

> right.trimmed.integrate( dnorm, 0, 20000 )  # test
0.5 with absolute error < 9.2e-05

This can be adapted to left trim or (left and right) trim, abs(f(x)-c)>epsilon,
etc.  Setting the tolerances epsilon and min.width is an issue,
but an explicit discussion of these values could encourage people to
think about the problem in their specific case.  And of course, none
of this guarantees a correct answer, especially if someone tries this
on non-monotonic integrands with complicated 0 sets.  One could write 
a somewhat more user-friendly version where the user has to specify 
some property (or set of properties) of the integrand, e.g. "right-tail 
decreasing to 0", etc. and have the algorithm try to do smart
trimming based on this.  But perhaps this getting too involved.

In the end, there is no general solution because any solution
depends on the specific nature of the integrand.  Clearer messages,
warnings in suspicious cases like subdivisions==1, and a simple
example explaining what the issue is in the help page would help
some people.

John
 
 ...........................................................................

 John P. Nolan
 Math/Stat Department
 227 Gray Hall
 American University
 4400 Massachusetts Avenue, NW
 Washington, DC 20016-8050

 jpnolan at american.edu
 202.885.3140 voice
 202.885.3155 fax
 http://academic2.american.edu/~jpnolan
 ...........................................................................

-----r-devel-bounces at r-project.org wrote: ----- 
To: r-devel at r-project.org, Prof Brian Ripley <ripley at stats.ox.ac.uk>
From: Martin Maechler 
Sent by: r-devel-bounces at r-project.org
Date: 12/07/2010 03:29AM
Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0

>>>>> Prof Brian Ripley <ripley at stats.ox.ac.uk>
>>>>>     on Tue, 7 Dec 2010 07:41:16 +0000 (GMT) writes:

    > On Mon, 6 Dec 2010, Spencer Graves wrote:
    >> Hello:
    >> 
    >> 
    >> The example "integrate(dnorm,0,20000)" says it "fails on many systems". 
    >> I just got 0 from it, when I should have gotten either an error or something 
    >> close to 0.5.  I got this with R 2.12.0 under both Windows Vista_x64 and 
    >> Linux (Fedora 13);  see the results from Windows below.  I thought you might 
    >> want to know.

    > Well, isn't that exactly what the help page says happens?  That 
    > example is part of a section entitled

    > ## integrate can fail if misused

    > and is part of the illustration of

    > If the function is
    > approximately constant (in particular, zero) over nearly all its
    > range it is possible that the result and error estimate may be
    > seriously wrong.

yes, of course, 
and the issue has been known for ``ages''  ..
..........
..........
but it seems that too many useRs are not reading the help
page carefully, but only browse it quickly.
I think we (R developers) have to live with this fact
and should consider adapting to it a bit more, particularly in
this case (see below)

    >> 
    >> Thanks for all your work in creating and maintaining R.
    >> 
    >> 
    >> Best Wishes,
    >> Spencer Graves
    >> ###############################

    >> 
    >> integrate(dnorm,0,20000) ## fails on many systems

    >> 0 with absolute error < 0

and this is particularly unsatisfactory for another reason:

"absolute error < 0"   
is *always* incorrect, so I think we should change *some*thing.

We could just use "<=" (and probably should in any case, or  
"< ~= x" which would convey ``is less than about x'' which I
think is all we can say),
but could consider giving a different message when the integral
evaluates to 0 or, rather actually,
only when the error bound ('abs.error') is 0 *and* 'subdivisions == 1'
as the latter indicates that the algorithm treated the integrand
f(.) as if f() was a linear function.

But in my quick experiments, even for linear (incl. constant)
functions, the 'abs.error' returned is never 0.

If we want to be cautious,
such a warning could be made explicitly suppressable by an argument
      .warn.if.doubtful = TRUE

An additional possibility I'd like to try, is a new argument
   'min.subdivisions = 3' which specifies the *minimal* number
of subdivisions to be used in addition to the already present
   'subdivisions = 100' (= the maximum number of subintervals.)

Martin Maechler,
ETH Zurich

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


More information about the R-devel mailing list