[R] Getting the values out of histogram (lattice)

Rolf Turner rolf.turner at xtra.co.nz
Thu Sep 1 02:09:05 CEST 2011


I'm not entirely sure that I understand what your problem is.
A reproducible example would probably have helped.

However I conjecture that the problem boils down to confusing
"probability" with "probability *density*".

Percentages are the (estimated) bin probabilities times 100.
The percentage for the i-th bin is 100*n_i/n where n_i is the count
for the i-th bin and n is the sum of the n_i.

The percentages sum to 100 (equivalent to probabilities summing to 1).

The *densities* in contrast *integrate* to 1.

The density value for the i-th bin is w_i * n_i/n where w_i is the width
of the i-th bin.  (If the breaks have been set sensibly, the w_i all have
the same value, i.e. the bin widths are all the same.)

Does this answer your question?  (In an example that I tried the percentages
and the density values are --- not surprisingly!!! --- completely 
consistent.)

You are correct in observing that it is difficult to dig out the 
``histogram values''
(the bar heights) when using lattice.  You can actually get at them using
lattice:::hist.constructor(), but it's not for the fainthearted.

     cheers,

         Rolf Turner

P. S. You really should be absolutely certain that you know what you're
talking about before accusing a package of giving ``wrong answers''.

         R. T.

On 01/09/11 01:50, Monica Pisica wrote:
>
>
> Hi,
>
>   
>
> I have a relatively big dataset and I want to construct
> some histograms using the histogram function in lattice. One thing I am
> interested in is to look at differences between density and percent. I know I can
> use the hist function but it seems that this function gives sometimes some
> wrong answers and the density is actually a percent since it is calculated as counts in the bin divided by the total no. of points. Let me explain.
>
>   
>
> If I let the hist function to decide the breaks, or I use
> a small number, or one of the pre-determined methods to select breaks then
> everything seems to be in order. But if I decide to use – for example – 100 as
> a breaks (I have over 90000 data points so the number of breaks is not
> necessarily too large I would think) the density for the first bin is over 1,
> although for all the other breaks the density is actually a percent since it is
> the count for that bin divided by the total no. of points I have. So …. Here it
> is something wrong or most probably I am doing something wrong.
>
>   
>
> If I use the function histogram from lattice it is
> obvious that there is a difference between the percent param and the density
> param. I looked at the function code and I didn't understand it – to be honest.
> It seems it calls inside the hist function, or a slightly modify variant of
> hist. Reading about the object trellis I saw I can access different info about
> the graph it generates but nothing about the actual data that goes into
> defining the histogram. How can I access the data from it?
>
>   
>
> I am not sure if my problem is platform specific – it should
> not be – but I have Rx64 2.13.1 on windows machine, in case it counts.
>
>   
>
> I appreciate your help, thanks,



More information about the R-help mailing list