[R] integrate

Fri Jun 23 19:47:01 CEST 2006

<comment inline>

Thomas Lumley wrote:
> On Fri, 23 Jun 2006, Rogério Rosa da Silva wrote:
> 
>> Dear All,
>>
>> My doubt about how to integrate a simple kernel density estimation 
>> goes on.
>>
>> I have seen the recent posts on integrate density estimation, which seem
>> similar to my question. However, I haven't found a solution.
>>
>> I have made two simple kernel density estimation by:
>>
>>    kde.1 <-density(x, bw=sd(x), kernel="gaussian")$y     # x<- 
>> c(2,3,5,12)
>>    kde.2 <-density(y, bw=sd(y), kernel="gaussian")$y     # y<- 
>> c(4,2,4,11)
>>
>> Now I would like to integrate the difference in the estimated density
>> values, i.e.:
>>
>>    diff.kde <- abs (kde.1- kde.2)
>>
>> How can I integrate diff.kde over -Inf to Inf ?
> 
> Well, the answer is zero.
> 
> Computationally this is a bit tricky.  You can turn the density 
> estimates into functions with approxfun()
> x<-rexp(100)
> kde<-density(x)
>  f<-approxfun(kde$x,kde$y,rule=2)
> integrate(f,-1,10)
> 1.000936 with absolute error < 3.3e-05
> 
> But if you want to integrate over -Inf to Inf you need the function to 
> specify the values outside the range of the data.  The only value that 
> will work over the range -Inf to Inf is zero

	  This may be true for standard splines but not if the segment at the 
end is log-linear.  For example, consider data with range(x) = c(x1, 
x.n) with (n-2) knots at x[i], i = 2:(n-1) with x[i]<x[i+1].  Consider a 
function f(x) modeled with exp(a1+b1*x) over [x[1], x[2]].  Then if b1>0,

	  integral{exp(a1+b1*x) from -Inf to x[2]}
	  = exp(a1+b1*x[2])/b1.

	  Do you know if this kind of thing has been studied?  I am NOT 
familiar with the literature on splines, but it would seem to me that 
this kind of thing could be quite valuable for building fast 
approximations to various mixture distributions that are otherwise quite 
difficult and expensive to compute.  It seems to me that this kind of 
thing could be used in multilevel modeling, with multidimensional 
smoothing splines fit to data obtained from potentially expensive 
evaluations of the unconditional likelihood, and the marginal likelihood 
could then be computed and optimized from the spline.  Then an estimate 
of the uncertainty could then be converted into another set of points at 
which to evaluate the likelihood and the process repeated until 
convergence.  I have problems like this that I can't solve right now, 
and it seems to me that this might lead to fast, accurate solutions.

	  Comments?
	  Spencer Graves

>> f<-approxfun(kde$x,kde$y,yleft=0,yright=0)
>> integrate(f,-1,10)
> 1.00072 with absolute error < 1.5e-05
>> integrate(f,-Inf,Inf)
> 1.000811 with absolute error < 2.3e-05
> 
> 
>     -thomas
> 
> 
> ------------------------------------------------------------------------
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html