[R] overlap histogram and density

(Ted Harding) ted.harding at wlandres.net
Thu Nov 11 21:12:31 CET 2010


[OOPS!!I accidentally reproduced my second example below
 as my third example. Now corrected. See below.]

On 11-Nov-10 20:02:29, Ted Harding wrote:
 
 On 11-Nov-10 18:39:34, Roslina Zakaria wrote:
> Hi,
> Does anybody encounter the same problem when we overlap histogram
> and density that the density line seem to shift to the right a
> little bit?
> 
> If you do have the same problem, what should we do to correct that?
> Thank you.
> 
> par(mar=c(4,4,2,1.2),oma=c(0,0,0,0))
> hist(datobs,prob=TRUE,
>      main ="Volume of a catchment from four stations",
>      col="yellowgreen", cex.axis=1, xlab="rainfall",
>      ylab="Relative frequency", ylim= c(0,.003), xlim=c(0,1200))
> 
> lines(density(dd), lwd=3,col="red")
> 
>#legend("topright",c("observed","generated"),
>#       lty=c(0,1),fill=c("blue",""),bty="n")
> 
> legend("topright", legend = c("observed","generated"),
> col = c("yellowgreen", "red"), pch=c(15,NA), lty = c(0, 1), 
> lwd=c(0,3),bty="n", pt.cex=2)
> box()
> 
> Thank you.
 
In theory that is not a problem. The density() function will
estimate a density whose integral over each of the intervals
in the histogram is equal to the probability of that interval,
and the proportion of the data expected in that interval will
also be its probability.

In practice, the estent to which you observe what you describe
(or a displacement to the left) will depend on how your data
are distributed within the intervals, and on the precision
with which density() happens to estimate the true density.

The following 3 cases of the same data sampled from a log-Normal
distribution, illustrate different impressions of the kind that
one might get, depending on the details of the histogram. Note
that there is no overall effect of "displacement to the right
in any histogram, while the extent to which one observes it
varies according to the histogram. Without knowledge of your
data it is not possible to comment further on the extent to
which you have observed it yourself!

set.seed(54321)
N  <- 1000
X  <- exp(rnorm(N,sd=0.4))
dd <- density(X)

# A coarse histogram
H  <- hist(X,prob=TRUE,
           xlim=c(-0.5,4),ylim=c(0,max(dd$y)),breaks=0.5*(0:8))
dx <- unique(diff(H$breaks))
lines(dd$x,dd$y)
 
## A finer histogram
H  <- hist(X,prob=TRUE,
           xlim=c(-0.5,4),ylim=c(0,max(dd$y)),breaks=0.25*(0:16))
dx <- unique(diff(H$breaks))
lines(dd$x,dd$y)
 
## A still finer histogram
H  <- hist(X,prob=TRUE,
## OOPS!!  xlim=c(-0.5,4),ylim=c(0,max(dd$y)),breaks=0.25*(0:16))
           xlim=c(-0.5,4),ylim=c(0,max(dd$y)),breaks=0.20*(0:20))
dx <- unique(diff(H$breaks))
lines(dd$x,dd$y)
 
 
 Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 11-Nov-10                                       Time: 20:12:27
------------------------------ XFMail ------------------------------



More information about the R-help mailing list