[R] estimating quantiles from binned data

Prof Brian Ripley ripley at stats.ox.ac.uk
Sun Sep 14 22:39:57 CEST 2003


On 14 Sep 2003, Russell Senior wrote:

> >>>>> "Spencer" == Spencer Graves <spencer.graves at pdf.com> writes:
> 
> Russell> Suppose I have a set of binned data, counts exceeding a
> Russell> series of arbitrary thresholds, a total N, a minimum and
> Russell> maximum, those sorts of things.  Is there a "standard" method
> Russell> for estimating arbitrary quantiles from this?  My initial
> Russell> thought is that the counts and min/max give me solutions at
> Russell> various points along the empirical cdf.  As the data are
> Russell> roughly log-normal, I thought maybe I could use piece-wise
> Russell> log-normal distributions between these points to estimate the
> Russell> arbitrary quantiles I am interested in.  Are there "better
> Russell> thought out" methods than this?  Thanks!
> 
> Spencer> Have you considered making a normal probability plot?  
> 
> This is probably not practical, given I have on the order of 7000 sets
> of binned data to evaluate.  I have prior knowledge of the data
> involved (i.e. when we have an actual sample rather than just bin
> counts), and though it isn't perfect, log normal usually isn't too bad
> particularly in comparison to a standard normal distribution.  I also
> want to match at the points where we "know" the quantiles precisely
> (i.e. at bin boundaries).
> 
> Spencer> The image of a mixture of lognormals would suggest limits on
> Spencer> the accuracy of such interpolation.
> 
> Oh, there are "limits", no doubt.  I guess the main point of my query
> is to evaluate whether there are better, more theoretically sound
> methods than the one I extracted from my hat.

The normal procedure is to linearly interpolate the ecdf, and unless
the intervals are widely spaced, that seems as good as anything else.
An intermediate position is to use a cubic spline interplation, 
constrained to be monotonic (not sure you can do that in R via canned 
procedures).

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list