[R] Histogram

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Tue May 25 13:15:20 CEST 2004


On 25-May-04 Cristian Pattaro wrote:
> I have a surprising problem with the representation of
> frequencies in a histogram.
> 
> Consider, for example, the R code:
> 
> b<-rnorm(2000,3.5,0.3)
> hist(b,freq=F)
> 
> When I plotted the histogram, I expected that values in
> the y-axis (the probability) varied between 0 and 1.
> Instead, they varied within the range 0-1.3.
> 
> Have you got any suggestion for obtaining a correct graph
> with probability within the range 0-1?

It depends on the widths of the bins, since what is plotted
in the histogram when freq=F is vertically scaled so that

  sum over bins of  h*(width of bin) = 1

where h is the height of the histogram bar according to the
vertical scale. In other words, hist plots a per-bin estimate
of the probability density in the sense of "amount of probability
per bin divided by width of bin". If your bin widths are narrow
(and your SD above is 0,3, so you will get quite narrow bins,
0.2 in this case) and you may well get values exceeding 1.

Exactly, indeed, as for the density of the normal distribution
itself: (1/(sqrt(2*pi)*sigma))*exp(-0.5* ... ) where small values
of sigma give density > 1 near x=0.

If you need the actual value of the probabilities in the bins
(i.e. n_i/N) then you can force it by constructing a new hist
object on the lines of

  h<-hist(b,freq=F)
  h$counts <- h$counts/sum(h$counts)
  plot(h)

When I do this with your above example, whereas the original
gives a y-axis from 0 to 1.2 with the tallest bar at about 1.3,
"plot(h)" give exactly the same graph but with the y-axis
labelled from 0 to 0.25, with the tallest bar at 0.2625, which
shows the probabilities.

Best wishes,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 167 1972
Date: 25-May-04                                       Time: 12:15:20
------------------------------ XFMail ------------------------------




More information about the R-help mailing list