[Rd] Suggested change to integrate.Rd
Spencer Graves
spencer.graves at structuremonitoring.com
Wed Dec 8 23:30:17 CET 2010
That sounds like a great idea to me: This should give the Core R team
more time to worry about the code by delegating maintenance of the help
files to a larger group.
Spencer
On 12/8/2010 2:22 PM, John Nolan wrote:
> Well, you can't idiot-proof things, but you can give clear descriptions and
> warnings.
> To take things to the extreme, one can eliminate all help files. If a user
> really wants
> to understand things, they can read the source code, right?
>
> This is a general question for r-dev: who are the help files aimed at? If
> the
> answer is experts only, then don't put any more effort into help files.
> But if you
> want more users to be able to do more things, then more explanation will
> help.
>
> Perhaps there should be a "documentation team" (r-doc?) that intersects
> r-dev, but
> focuses on documentation?
>
> John, American U
>
>
>
>
> From: "Ravi Varadhan"<rvaradhan at jhmi.edu>
> To: "'John Nolan'"<jpnolan at american.edu>,
> <spencer.graves at structuremonitoring.com>
> Cc: <r-devel at r-project.org>
> Date: 12/08/2010 10:43 AM
> Subject: RE: [Rd] Suggested change to integrate.Rd (was: Re: 0.5 !=
> integrate(dnorm, 0, 20000) = 0)
>
>
>
> Hi,
>
> My honest and (not so) humble opinion is that no amount of clear and
> explicit warning can totally prevent the inappropriate use of any tool.
> Users will continue to use the tools, without doing the necessary
> background
> work to figure out whether the that tool is the appropriate one for their
> particular problem. If things can go so horribly wrong in such a simple
> case, imagine all the snares and traps present in complex, high-dimensional
> integration. Even the best cubature rules or the MCMC methods can give
> wrong results. Even worse, how in heaven's name can we be sure that the
> answer is any good? The simple and best solution is to understand your
> integrand as best as you can. I realize that this may be viewed as being
> too pedantic, but unfortunately, it is also the best advice.
>
> Best,
> Ravi.
> -------------------------------------------------------
> Ravi Varadhan, Ph.D.
> Assistant Professor,
> Division of Geriatric Medicine and Gerontology School of Medicine Johns
> Hopkins University
>
> Ph. (410) 502-2619
> email: rvaradhan at jhmi.edu
>
>
> -----Original Message-----
> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org]
> On Behalf Of John Nolan
> Sent: Tuesday, December 07, 2010 11:09 PM
> To: spencer.graves at structuremonitoring.com
> Cc: r-devel at r-project.org
> Subject: Re: [Rd] Suggested change to integrate.Rd (was: Re: 0.5 !=
> integrate(dnorm, 0, 20000) = 0)
>
> R developers understand intimately how things work, and terse
> descriptions are sufficient. However, most typical R users
> would benefit from clearer documentation. In multiple places
> I've found the R documentation to be correct and understandable
> AFTER I've figured a function out.
>
> And to be fair, this problem with integrate( ) isn't really R's
> fault: the QUADPACK routines that R uses are very good algorithms,
> but neither they nor any other package can handle all cases.
>
> I would support reasonable changes in the documentation for
> integrate( ). Just saying it "gives wrong answer without
> warning on many systems" seems misleading (it works fine in
> many cases) and it doesn't help a user understand how to use
> integrate( ) correctly/carefully. IMO a simple example like
> this one w/ dnorm would catch peoples attention and a couple
> lines of explanation/warning would then make more sense.
>
> John Nolan, American U
>
>
> -----Spencer Graves<spencer.graves at structuremonitoring.com> wrote: -----
> To: John Nolan<jpnolan at american.edu>
> From: Spencer Graves<spencer.graves at structuremonitoring.com>
> Date: 12/07/2010 07:58PM
> Cc: pchausse at uwaterloo.ca, r-devel at r-project.org
> Subject: Suggested change to integrate.Rd (was: Re: [Rd] 0.5 !=
> integrate(dnorm,0,20000) = 0)
>
> What do you think about changing the verbiage with that example
> in "integrate.Rd" from "fails on many systems" to something like
> "gives wrong answer without warning on many systems"?
>
>
> If I had write access to the core R code, I'd change this
> myself: I'm probably not the only user who might think that saying
> something "fails" suggest it gives an error message. Many contributions
> on this thread make it clear that it will never be possible to write an
> integrate function that won't give a "wrong answer without warning" in
> some cases.
>
>
> Thanks,
> Spencer
>
>
> #############################
> On 12/7/2010 7:02 AM, John Nolan wrote:
>> Putting in Inf for the upper bound does not work in general:
>> all 3 of the following should give 0.5
>>
>>> integrate( dnorm, 0, Inf )
>> 0.5 with absolute error< 4.7e-05
>>
>>> integrate( dnorm, 0, Inf, sd=100000 )
>> Error in integrate(dnorm, 0, Inf, sd = 1e+05) :
>> the integral is probably divergent
>>
>>> integrate( dnorm, 0, Inf, sd=10000000 )
>> 5.570087e-05 with absolute error< 0.00010
>>
>> Numerical quadrature methods look at a finite number of
>> points, and you can find examples that will confuse any
>> algorithm. Rather than hope a general method will solve
>> all problems, users should look at their integrand and
>> pick an appropriate region of integration.
>>
>> John Nolan, American U.
>>
>>
>> -----r-devel-bounces at r-project.org wrote: -----
>> To: r-devel at r-project.org
>> From: Pierre Chausse
>> Sent by: r-devel-bounces at r-project.org
>> Date: 12/07/2010 09:46AM
>> Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0
>>
>> The warning about "absolute error == 0" would not be sufficient
>> because if you do
>> > integrate(dnorm, 0, 5000)
>> 2.326323e-06 with absolute error< 4.6e-06
>>
>> We get reasonable absolute error and wrong answer. For very high upper
>> bound, it seems more stable to use "Inf". In that case, another
>> .External is used which seems to be optimized for high or low bounds:
>>
>> > integrate(dnorm, 0,Inf)
>> 0.5 with absolute error< 4.7e-05
>>
>>
>> On 10-12-07 8:38 AM, John Nolan wrote:
>>> I have wrestled with this problem before. I think correcting
>>> the warning to "absolute error ~<= 0" is a good idea, and printing
>>> a warning if subdivisions==1 is also helpful. Also, including
>>> a simple example like the one that started this thread on the
>>> help page for integrate might make the issue more clear to users.
>>>
>>> But min.subdivisions is probably not. On the example with dnorm( ),
>>> I doubt 3 subdivisions would work. The problem isn't that
>>> we aren't sudividing enough, the problem is that the integrand
>>> is 0 (in double precision) on most of the region and the
>>> algorithm isn't designed to handle this. There is no way to
>>> determine how many subdivisions are necessary to get a reasonable
>>> answer without a detailed analysis of the integrand.
>>>
>>> I've gotten useful results with integrands that are monotonic on
>>> the tail with a "self-triming integration" routine
>>> like the following:
>>>
>>>> right.trimmed.integrate<- function( f, lower, upper, epsilon=1e-100,
> min.width=1e-10, ... ) {
>>> + # trim the region of integration on the right until f(x)> epsilon
>>> +
>>> + a<- lower; b<- upper
>>> + while ( (b-a>min.width)&& (f(b)<epsilon) ) { b<- (a+b)/2 }
>>> +
>>> + return( integrate(f,a,b,...) ) }
>>>
>>>> right.trimmed.integrate( dnorm, 0, 20000 ) # test
>>> 0.5 with absolute error< 9.2e-05
>>>
>>> This can be adapted to left trim or (left and right) trim,
> abs(f(x)-c)>epsilon,
>>> etc. Setting the tolerances epsilon and min.width is an issue,
>>> but an explicit discussion of these values could encourage people to
>>> think about the problem in their specific case. And of course, none
>>> of this guarantees a correct answer, especially if someone tries this
>>> on non-monotonic integrands with complicated 0 sets. One could write
>>> a somewhat more user-friendly version where the user has to specify
>>> some property (or set of properties) of the integrand, e.g. "right-tail
>>> decreasing to 0", etc. and have the algorithm try to do smart
>>> trimming based on this. But perhaps this getting too involved.
>>>
>>> In the end, there is no general solution because any solution
>>> depends on the specific nature of the integrand. Clearer messages,
>>> warnings in suspicious cases like subdivisions==1, and a simple
>>> example explaining what the issue is in the help page would help
>>> some people.
>>>
>>> John
>>>
>>>
> ...........................................................................
>>> John P. Nolan
>>> Math/Stat Department
>>> 227 Gray Hall
>>> American University
>>> 4400 Massachusetts Avenue, NW
>>> Washington, DC 20016-8050
>>>
>>> jpnolan at american.edu
>>> 202.885.3140 voice
>>> 202.885.3155 fax
>>> http://academic2.american.edu/~jpnolan
>>>
> ...........................................................................
>>> -----r-devel-bounces at r-project.org wrote: -----
>>> To: r-devel at r-project.org, Prof Brian Ripley<ripley at stats.ox.ac.uk>
>>> From: Martin Maechler
>>> Sent by: r-devel-bounces at r-project.org
>>> Date: 12/07/2010 03:29AM
>>> Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0
>>>
>>>>>>>> Prof Brian Ripley<ripley at stats.ox.ac.uk>
>>>>>>>> on Tue, 7 Dec 2010 07:41:16 +0000 (GMT) writes:
>>> > On Mon, 6 Dec 2010, Spencer Graves wrote:
>>> >> Hello:
>>> >>
>>> >>
>>> >> The example "integrate(dnorm,0,20000)" says it "fails on many
> systems".
>>> >> I just got 0 from it, when I should have gotten either an
> error or something
>>> >> close to 0.5. I got this with R 2.12.0 under both Windows
> Vista_x64 and
>>> >> Linux (Fedora 13); see the results from Windows below. I
> thought you might
>>> >> want to know.
>>>
>>> > Well, isn't that exactly what the help page says happens?
> That
>>> > example is part of a section entitled
>>>
>>> > ## integrate can fail if misused
>>>
>>> > and is part of the illustration of
>>>
>>> > If the function is
>>> > approximately constant (in particular, zero) over nearly all
> its
>>> > range it is possible that the result and error estimate may be
>>> > seriously wrong.
>>>
>>> yes, of course,
>>> and the issue has been known for ``ages'' ..
>>> ..........
>>> ..........
>>> but it seems that too many useRs are not reading the help
>>> page carefully, but only browse it quickly.
>>> I think we (R developers) have to live with this fact
>>> and should consider adapting to it a bit more, particularly in
>>> this case (see below)
>>>
>>> >>
>>> >> Thanks for all your work in creating and maintaining R.
>>> >>
>>> >>
>>> >> Best Wishes,
>>> >> Spencer Graves
>>> >> ###############################
>>>
>>> >>
>>> >> integrate(dnorm,0,20000) ## fails on many systems
>>>
>>> >> 0 with absolute error< 0
>>>
>>> and this is particularly unsatisfactory for another reason:
>>>
>>> "absolute error< 0"
>>> is *always* incorrect, so I think we should change *some*thing.
>>>
>>> We could just use "<=" (and probably should in any case, or
>>> "< ~= x" which would convey ``is less than about x'' which I
>>> think is all we can say),
>>> but could consider giving a different message when the integral
>>> evaluates to 0 or, rather actually,
>>> only when the error bound ('abs.error') is 0 *and* 'subdivisions == 1'
>>> as the latter indicates that the algorithm treated the integrand
>>> f(.) as if f() was a linear function.
>>>
>>> But in my quick experiments, even for linear (incl. constant)
>>> functions, the 'abs.error' returned is never 0.
>>>
>>> If we want to be cautious,
>>> such a warning could be made explicitly suppressable by an argument
>>> .warn.if.doubtful = TRUE
>>>
>>> An additional possibility I'd like to try, is a new argument
>>> 'min.subdivisions = 3' which specifies the *minimal* number
>>> of subdivisions to be used in addition to the already present
>>> 'subdivisions = 100' (= the maximum number of subintervals.)
>>>
>>> Martin Maechler,
>>> ETH Zurich
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph: 408-655-4567
More information about the R-devel
mailing list