[R] Problem with R "density" function

Andrews, Chris chrisaa at med.umich.edu
Wed May 14 15:04:29 CEST 2014


Try adding plots, e.g. 

set.seed(20140514)
x <- rnorm(100)
hist(x, prob=TRUE, ylim=c(0,10))
dd <- density(x, n=10001, bw=0.001)
lines(dd, col=2, type="s")
dd <- density(x, n=101, bw=0.001)
lines(dd, col=3, type="s")

The density function you produce with bw=0.001 is very irregular (many sharp, narrow peaks).  You should expect to need many intervals (i.e., large n) in your Reimann integral to get an accurate estimate of the area under it.

Chris


-----Original Message-----
From: Martyn Byng [mailto:martyn.byng at nag.co.uk] 
Sent: Wednesday, May 14, 2014 5:58 AM
To: DHIMAN BHADRA; r-help at r-project.org
Subject: Re: [R] Problem with R "density" function

Hi,

Have you tried using a different bandwidth rather than the number of points,  the default bandwidth gives ...

x <- rnorm(10000)
dd <- density(x,kernel="epanechnikov",n=101)
sum(dd$y)*(dd$x[2]-dd$x[1])
[1] 1.001014

Martyn
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of DHIMAN BHADRA
Sent: 14 May 2014 10:36
To: r-help at r-project.org
Subject: [R] Problem with R "density" function

Hello,
My friend has the following issue with R. I will be glad to receive any response.
Thanks,
Dhiman Bhadra

Hello everyone,

I am trying to use the 'density' function available with the base package of R to estimate the density of a data set for subsequent use. I just noticed that with even 1000 data points, the numerical integral of the estimated density using the Epanechnikov kernel is far from 1. I wonder if I am doing something wrong, or whether there is a bug:

x=rnorm(10000)
> dd=density(x,kernel="epanechnikov",n=101,bw=0.001)
> sum(dd$y)*(dd$x[2]-dd$x[1])
[1] 5.7245

> dd=density(x,kernel="epanechnikov",n=1001,bw=0.001)
> sum(dd$y)*(dd$x[2]-dd$x[1])
[1] 2.870922

> dd=density(x,kernel="epanechnikov",n=10001,bw=0.001)
> sum(dd$y)*(dd$x[2]-dd$x[1])
[1] 0.9989762

So unless I use around 10000 or more data points, the integral is wrong:
there seems to be a scaling factor creeping in. Am I missing something?


Best regards,
*Apratim Guha*

__________________________________________________________________________
*Dr. Apratim Guha*
*Associate Professor, Production & Quantitative Methods Area, IIM Ahmedabad, *

*Vastrapur, Ahmedabad 380015, INDIA. Phone: (91) 79 6632 4803*
*Secretary: Ms. Sujatha Jayprakash: (91) 79 6632 4911*

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

________________________________________________________________________
This e-mail has been scanned for all viruses by Star.\ _...{{dropped:7}}



More information about the R-help mailing list