[R] density of hist(freq = FALSE) inversely affected by data magnitude

William Dunlap wdunlap at tibco.com
Wed Jan 23 17:51:10 CET 2013


I think it is a fair bit of work to interpret the freq=TRUE (prob=FALSE)
version of hist() when the bins have unequal sizes.  E.g.,
in the following the bins are sized so that each contains
an equal number of observations.  The resulting flat
frequency plot is hard for me to interpret.  The density plot
is easy.

  > x <- rnorm(1000, sd=50)
  > hist(x, breaks=quantile(x,(0:10)/10), prob=TRUE)
  > hist(x, breaks=quantile(x,(0:10)/10), prob=FALSE)
  Warning message:
  In plot.histogram(r, freq = freq1, col = col, border = border, angle = angle,  :
    the AREAS in the plot are wrong -- rather use freq=FALSE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: J Toll [mailto:jctoll at gmail.com]
> Sent: Tuesday, January 22, 2013 5:32 PM
> To: William Dunlap
> Cc: r-help
> Subject: Re: [R] density of hist(freq = FALSE) inversely affected by data magnitude
> 
> Bill,
> 
> Thank you.  I got it.  That can require a fair amount of work to
> interpret the density, especially with odd or irregular bin sizes.
> 
> Thanks again,
> 
> James
> 
> 
> 
> On Tue, Jan 22, 2013 at 5:33 PM, William Dunlap <wdunlap at tibco.com> wrote:
> > The probability density function is not unitless - it is the derivative of the
> > [cumulative] probability distribution function so it has units delta-probability-mass
> > over delta-x.  It must integrate to 1 (over the all possible x).  hist(freq=FALSE,x)
> > or hist(prob=TRUE,x) displays an estimate of the density function and the following
> > example shows how the scale matches what you get from the presumed
> > population density function.
> >
> >> f
> > function (n, sd)
> > {
> >     x <- rnorm(n, sd = sd)
> >     hist(x, freq = FALSE) # estimated density
> >     s <- seq(min(x), max(x), len = 129)
> >     lines(s, dnorm(s, sd = sd), col = "red") # overlay expected density for this sample
> > }
> >> f(1e6, sd=1)
> >> f(100, sd=1)
> >> f(100, sd=0.0001)
> >> f(1e6, sd=0.0001)
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com


More information about the R-help mailing list