# [R] integrate

Spencer Graves spencer.graves at pdf.com
Fri Jun 23 19:47:01 CEST 2006

```<comment inline>

Thomas Lumley wrote:
> On Fri, 23 Jun 2006, Rogério Rosa da Silva wrote:
>
>> Dear All,
>>
>> My doubt about how to integrate a simple kernel density estimation
>> goes on.
>>
>> I have seen the recent posts on integrate density estimation, which seem
>> similar to my question. However, I haven't found a solution.
>>
>> I have made two simple kernel density estimation by:
>>
>>    kde.1 <-density(x, bw=sd(x), kernel="gaussian")\$y     # x<-
>> c(2,3,5,12)
>>    kde.2 <-density(y, bw=sd(y), kernel="gaussian")\$y     # y<-
>> c(4,2,4,11)
>>
>> Now I would like to integrate the difference in the estimated density
>> values, i.e.:
>>
>>    diff.kde <- abs (kde.1- kde.2)
>>
>> How can I integrate diff.kde over -Inf to Inf ?
>
> Well, the answer is zero.
>
> Computationally this is a bit tricky.  You can turn the density
> estimates into functions with approxfun()
> x<-rexp(100)
> kde<-density(x)
>  f<-approxfun(kde\$x,kde\$y,rule=2)
> integrate(f,-1,10)
> 1.000936 with absolute error < 3.3e-05
>
> But if you want to integrate over -Inf to Inf you need the function to
> specify the values outside the range of the data.  The only value that
> will work over the range -Inf to Inf is zero

This may be true for standard splines but not if the segment at the
end is log-linear.  For example, consider data with range(x) = c(x1,
x.n) with (n-2) knots at x[i], i = 2:(n-1) with x[i]<x[i+1].  Consider a
function f(x) modeled with exp(a1+b1*x) over [x[1], x[2]].  Then if b1>0,

integral{exp(a1+b1*x) from -Inf to x[2]}
= exp(a1+b1*x[2])/b1.

Do you know if this kind of thing has been studied?  I am NOT
familiar with the literature on splines, but it would seem to me that
this kind of thing could be quite valuable for building fast
approximations to various mixture distributions that are otherwise quite
difficult and expensive to compute.  It seems to me that this kind of
thing could be used in multilevel modeling, with multidimensional
smoothing splines fit to data obtained from potentially expensive
evaluations of the unconditional likelihood, and the marginal likelihood
could then be computed and optimized from the spline.  Then an estimate
of the uncertainty could then be converted into another set of points at
which to evaluate the likelihood and the process repeated until
convergence.  I have problems like this that I can't solve right now,
and it seems to me that this might lead to fast, accurate solutions.

Spencer Graves

>> f<-approxfun(kde\$x,kde\$y,yleft=0,yright=0)
>> integrate(f,-1,10)
> 1.00072 with absolute error < 1.5e-05
>> integrate(f,-Inf,Inf)
> 1.000811 with absolute error < 2.3e-05
>
>
>     -thomas
>
>
> ------------------------------------------------------------------------
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help