[R] how to get the CDF of a density() estimation?

Thu Dec 11 15:11:06 CET 2008

On Thu, 11 Dec 2008 14:28:31 +0100, Viktor Nagy wrote:

VN> Hi,
VN> 
VN> I've estimated a simple kernel density of a univariate variable with
VN> density(), but after I would like to find out the CDF at specific
VN> values.
VN> How can I do it?
VN> 

Answer 1.
Use approfun to interpolate the outcome from density() and then 
use integrate(). The following lines show a *crude* coding of this
idea:

R> x<- rnorm(200)
R> pdf<- density(x)
R> f<- approxfun(pdf$x, pdf$y, yleft=0, yright=0)
R> cdf<-integrate(f, -Inf, 2)  # replace '2' by any other value.

Answer 2.
Do not integrate the estimated density, since this is not the most
efficient estimate of the underlying CDF. Instead, smooth the empirical
distribution function, using a smaller bandwidth of the kernel. The
optimal bandwith for kernel density estimation is of order 0(n^{-1/5}),
while for CDF estimation is O(n^{-1/3}), if n denotes the sample size.

In practical terms you can still use density(), as indicated above, but
selecting a suitably smaller bandwith compared to the one used for
density estimation.

Best wishes

Adelchi Azzalini
-- 
Adelchi Azzalini  <azzalini at stat.unipd.it>
Dipart.Scienze Statistiche, Università di Padova, Italia
tel. +39 049 8274147,  http://azzalini.stat.unipd.it/