[R] Median of streaming data
r.turner at auckland.ac.nz
Wed Sep 24 08:43:34 CEST 2014
On 24/09/14 17:31, Mohan Radhakrishnan wrote:
> I have streaming data(1 TB) that can't fit in memory. Is there a
> way for me to find the median of these streaming integers assuming I can
> fit only a small part in memory ? This is about the statistical approach to
> find the median of a large number of values when I can inspect only a part
> of them due to memory constraints.
You cannot, I'm pretty sure, calculate the median recursively. However
there are "approximate" recursive median algorithms which provide an
estimate of location that has the same asymptotic properties as the median.
* U. Holst, Recursive estimators of location. Commun. Statist. Theory
Meth., vol. 16, 1987, pp. 2201--2226.
* Murray A. Cameron and T. Rolf Turner, Recursive location and scale
estimators, Commun. Statist. Theory Meth., vol. 22, 1993,
Technical Editor ANZJS
More information about the R-help