[Rd] Suggested change to integrate.Rd

Spencer Graves spencer.graves at structuremonitoring.com
Wed Dec 8 23:30:17 CET 2010


That sounds like a great idea to me:  This should give the Core R team 
more time to worry about the code by delegating maintenance of the help 
files to a larger group.


Spencer


On 12/8/2010 2:22 PM, John Nolan wrote:
> Well, you can't idiot-proof things, but you can give clear descriptions and
> warnings.
> To take things to the extreme, one can eliminate all help files.  If a user
> really wants
> to understand things, they can read the source code, right?
>
> This is a general question for r-dev: who are the help files aimed at? If
> the
> answer is experts only, then don't put any more effort into help files.
> But if you
> want more users to be able to do more things, then more explanation will
> help.
>
> Perhaps there should be a "documentation team" (r-doc?) that intersects
> r-dev, but
> focuses on documentation?
>
> John,  American U
>
>
>
>
> From:	"Ravi Varadhan"<rvaradhan at jhmi.edu>
> To:	"'John Nolan'"<jpnolan at american.edu>,
>              <spencer.graves at structuremonitoring.com>
> Cc:	<r-devel at r-project.org>
> Date:	12/08/2010 10:43 AM
> Subject:	RE: [Rd] Suggested change to integrate.Rd (was: Re: 0.5 !=
>              integrate(dnorm, 0, 20000) = 0)
>
>
>
> Hi,
>
> My honest and (not so) humble opinion is that no amount of clear and
> explicit warning can totally prevent the inappropriate use of any tool.
> Users will continue to use the tools, without doing the necessary
> background
> work to figure out whether the that tool is the appropriate one for their
> particular problem.  If things can go so horribly wrong in such a simple
> case, imagine all the snares and traps present in complex, high-dimensional
> integration.  Even the best cubature rules or the MCMC methods can give
> wrong results.  Even worse, how in heaven's name can we be sure that the
> answer is any good?  The simple and best solution is to understand your
> integrand as best as you can.  I realize that this may be viewed as being
> too pedantic, but unfortunately, it is also the best advice.
>
> Best,
> Ravi.
> -------------------------------------------------------
> Ravi Varadhan, Ph.D.
> Assistant Professor,
> Division of Geriatric Medicine and Gerontology School of Medicine Johns
> Hopkins University
>
> Ph. (410) 502-2619
> email: rvaradhan at jhmi.edu
>
>
> -----Original Message-----
> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org]
> On Behalf Of John Nolan
> Sent: Tuesday, December 07, 2010 11:09 PM
> To: spencer.graves at structuremonitoring.com
> Cc: r-devel at r-project.org
> Subject: Re: [Rd] Suggested change to integrate.Rd (was: Re: 0.5 !=
> integrate(dnorm, 0, 20000) = 0)
>
> R developers understand intimately how things work, and terse
> descriptions are sufficient.  However, most typical R users
> would benefit from clearer documentation.  In multiple places
> I've found the R documentation to be correct and understandable
> AFTER I've figured a function out.
>
> And to be fair, this problem with integrate( ) isn't really R's
> fault: the QUADPACK routines that R uses are very good algorithms,
> but neither they nor any other package can handle all cases.
>
> I would support reasonable changes in the documentation for
> integrate( ).   Just saying it "gives wrong answer without
> warning on many systems" seems misleading (it works fine in
> many cases) and it doesn't help a user understand how to use
> integrate( ) correctly/carefully.  IMO a simple example like
> this one w/ dnorm would catch peoples attention and a couple
> lines of explanation/warning would then make more sense.
>
> John Nolan, American U
>
>
> -----Spencer Graves<spencer.graves at structuremonitoring.com>  wrote: -----
> To: John Nolan<jpnolan at american.edu>
> From: Spencer Graves<spencer.graves at structuremonitoring.com>
> Date: 12/07/2010 07:58PM
> Cc: pchausse at uwaterloo.ca, r-devel at r-project.org
> Subject: Suggested change to integrate.Rd (was: Re: [Rd] 0.5 !=
> integrate(dnorm,0,20000) = 0)
>
>         What do you think about changing the verbiage with that example
> in "integrate.Rd" from "fails on many systems" to something like
> "gives wrong answer without warning on many systems"?
>
>
>         If I had write access to the core R code, I'd change this
> myself:  I'm probably not the only user who might think that saying
> something "fails" suggest it gives an error message.  Many contributions
> on this thread make it clear that it will never be possible to write an
> integrate function that won't give a "wrong answer without warning" in
> some cases.
>
>
>         Thanks,
>         Spencer
>
>
> #############################
> On 12/7/2010 7:02 AM, John Nolan wrote:
>> Putting in Inf for the upper bound does not work in general:
>> all 3 of the following should give 0.5
>>
>>> integrate( dnorm, 0, Inf )
>> 0.5 with absolute error<   4.7e-05
>>
>>> integrate( dnorm, 0, Inf, sd=100000 )
>> Error in integrate(dnorm, 0, Inf, sd = 1e+05) :
>>     the integral is probably divergent
>>
>>> integrate( dnorm, 0, Inf, sd=10000000 )
>> 5.570087e-05 with absolute error<   0.00010
>>
>> Numerical quadrature methods look at a finite number of
>> points, and you can find examples that will confuse any
>> algorithm.  Rather than hope a general method will solve
>> all problems, users should look at their integrand and
>> pick an appropriate region of integration.
>>
>> John Nolan, American U.
>>
>>
>> -----r-devel-bounces at r-project.org wrote: -----
>> To: r-devel at r-project.org
>> From: Pierre Chausse
>> Sent by: r-devel-bounces at r-project.org
>> Date: 12/07/2010 09:46AM
>> Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0
>>
>>     The warning about "absolute error == 0" would not be sufficient
>> because if you do
>>    >   integrate(dnorm, 0, 5000)
>> 2.326323e-06 with absolute error<   4.6e-06
>>
>> We get reasonable absolute error and wrong answer. For very high upper
>> bound, it seems more stable to use "Inf". In that case, another
>> .External is used which seems to be optimized for high or low bounds:
>>
>>    >   integrate(dnorm, 0,Inf)
>> 0.5 with absolute error<   4.7e-05
>>
>>
>> On 10-12-07 8:38 AM, John Nolan wrote:
>>> I have wrestled with this problem before.  I think correcting
>>> the warning to "absolute error ~<= 0" is a good idea, and printing
>>> a warning if subdivisions==1 is also helpful.  Also, including
>>> a simple example like the one that started this thread on the
>>> help page for integrate might make the issue more clear to users.
>>>
>>> But min.subdivisions is probably not.  On the example with dnorm( ),
>>> I doubt 3 subdivisions would work.  The problem isn't that
>>> we aren't sudividing enough, the problem is that the integrand
>>> is 0 (in double precision) on most of the region and the
>>> algorithm isn't designed to handle this.  There is no way to
>>> determine how many subdivisions are necessary to get a reasonable
>>> answer without a detailed analysis of the integrand.
>>>
>>> I've gotten useful results with integrands that are monotonic on
>>> the tail with a "self-triming integration" routine
>>> like the following:
>>>
>>>> right.trimmed.integrate<- function( f, lower, upper, epsilon=1e-100,
> min.width=1e-10, ... ) {
>>> + # trim the region of integration on the right until f(x)>    epsilon
>>> +
>>> + a<- lower; b<- upper
>>> + while ( (b-a>min.width)&&    (f(b)<epsilon) ) { b<- (a+b)/2 }
>>> +
>>> + return( integrate(f,a,b,...) ) }
>>>
>>>> right.trimmed.integrate( dnorm, 0, 20000 )  # test
>>> 0.5 with absolute error<    9.2e-05
>>>
>>> This can be adapted to left trim or (left and right) trim,
> abs(f(x)-c)>epsilon,
>>> etc.  Setting the tolerances epsilon and min.width is an issue,
>>> but an explicit discussion of these values could encourage people to
>>> think about the problem in their specific case.  And of course, none
>>> of this guarantees a correct answer, especially if someone tries this
>>> on non-monotonic integrands with complicated 0 sets.  One could write
>>> a somewhat more user-friendly version where the user has to specify
>>> some property (or set of properties) of the integrand, e.g. "right-tail
>>> decreasing to 0", etc. and have the algorithm try to do smart
>>> trimming based on this.  But perhaps this getting too involved.
>>>
>>> In the end, there is no general solution because any solution
>>> depends on the specific nature of the integrand.  Clearer messages,
>>> warnings in suspicious cases like subdivisions==1, and a simple
>>> example explaining what the issue is in the help page would help
>>> some people.
>>>
>>> John
>>>
>>>
> ...........................................................................
>>>     John P. Nolan
>>>     Math/Stat Department
>>>     227 Gray Hall
>>>     American University
>>>     4400 Massachusetts Avenue, NW
>>>     Washington, DC 20016-8050
>>>
>>>     jpnolan at american.edu
>>>     202.885.3140 voice
>>>     202.885.3155 fax
>>>     http://academic2.american.edu/~jpnolan
>>>
> ...........................................................................
>>> -----r-devel-bounces at r-project.org wrote: -----
>>> To: r-devel at r-project.org, Prof Brian Ripley<ripley at stats.ox.ac.uk>
>>> From: Martin Maechler
>>> Sent by: r-devel-bounces at r-project.org
>>> Date: 12/07/2010 03:29AM
>>> Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0
>>>
>>>>>>>> Prof Brian Ripley<ripley at stats.ox.ac.uk>
>>>>>>>>        on Tue, 7 Dec 2010 07:41:16 +0000 (GMT) writes:
>>>        >    On Mon, 6 Dec 2010, Spencer Graves wrote:
>>>        >>    Hello:
>>>        >>
>>>        >>
>>>        >>    The example "integrate(dnorm,0,20000)" says it "fails on many
> systems".
>>>        >>    I just got 0 from it, when I should have gotten either an
> error or something
>>>        >>    close to 0.5.  I got this with R 2.12.0 under both Windows
> Vista_x64 and
>>>        >>    Linux (Fedora 13);  see the results from Windows below.  I
> thought you might
>>>        >>    want to know.
>>>
>>>        >    Well, isn't that exactly what the help page says happens?
> That
>>>        >    example is part of a section entitled
>>>
>>>        >    ## integrate can fail if misused
>>>
>>>        >    and is part of the illustration of
>>>
>>>        >    If the function is
>>>        >    approximately constant (in particular, zero) over nearly all
> its
>>>        >    range it is possible that the result and error estimate may be
>>>        >    seriously wrong.
>>>
>>> yes, of course,
>>> and the issue has been known for ``ages''  ..
>>> ..........
>>> ..........
>>> but it seems that too many useRs are not reading the help
>>> page carefully, but only browse it quickly.
>>> I think we (R developers) have to live with this fact
>>> and should consider adapting to it a bit more, particularly in
>>> this case (see below)
>>>
>>>        >>
>>>        >>    Thanks for all your work in creating and maintaining R.
>>>        >>
>>>        >>
>>>        >>    Best Wishes,
>>>        >>    Spencer Graves
>>>        >>    ###############################
>>>
>>>        >>
>>>        >>    integrate(dnorm,0,20000) ## fails on many systems
>>>
>>>        >>    0 with absolute error<    0
>>>
>>> and this is particularly unsatisfactory for another reason:
>>>
>>> "absolute error<    0"
>>> is *always* incorrect, so I think we should change *some*thing.
>>>
>>> We could just use "<=" (and probably should in any case, or
>>> "<    ~= x" which would convey ``is less than about x'' which I
>>> think is all we can say),
>>> but could consider giving a different message when the integral
>>> evaluates to 0 or, rather actually,
>>> only when the error bound ('abs.error') is 0 *and* 'subdivisions == 1'
>>> as the latter indicates that the algorithm treated the integrand
>>> f(.) as if f() was a linear function.
>>>
>>> But in my quick experiments, even for linear (incl. constant)
>>> functions, the 'abs.error' returned is never 0.
>>>
>>> If we want to be cautious,
>>> such a warning could be made explicitly suppressable by an argument
>>>          .warn.if.doubtful = TRUE
>>>
>>> An additional possibility I'd like to try, is a new argument
>>>       'min.subdivisions = 3' which specifies the *minimal* number
>>> of subdivisions to be used in addition to the already present
>>>       'subdivisions = 100' (= the maximum number of subintervals.)
>>>
>>> Martin Maechler,
>>> ETH Zurich
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


-- 
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567



More information about the R-devel mailing list