# [R] Weighted Histogram

Neil E. Klepeis nklepeis at uclink4.berkeley.edu
Sun Sep 17 21:40:11 CEST 2000

```Greetings,

I'm having trouble finding a simple way to calculate a weighted
histogram where there may be zero raw counts in a given interval.

Given equal-length vectors of data 'data' and weights 'w', and breaks
(intervals) for the histogram, I calculate a weighted histogram as
follows (see MASS's 'truehist' for an unweighted histogram):

bin <- cut(data, breaks, include.lowest = TRUE)
wsums <- sapply(split( w, f = bin), sum)
prob <- wsums/(diff(breaks) * sum(w))

where 'bin' is the recoded data vector, 'wsums' is the sum of weights
across each factor in 'bin', and 'prob' is the vector of probabilities
used to plot the true histogram.

But if there are no data points in a certain interval (i.e., zero
frequency in an interval), the 'wsums' vector does not have the same
length as 'diff(breaks)' and we are in trouble.  Doing an unweighted
histogram is no trouble as we use 'tabulate' with a specified number of
integers equal to the number of total intervals, which results in a
vector that includes the zero frequency intervals:

counts <- tabulate(bin, length(levels(bin)))

Does anyone know of a simple (vectorized) way in R to calculate a
weighted histogram where there may be zero counts in any number of
intervals?  [If not, I suppose I could loop through each interval...]

Thanks,
Neil

--
___________________________________________________________
Neil E. Klepeis, School of Public Health, UC Berkeley, USA
http://eetd.lbl.gov/ied/era/exposuremodeling/