[R] [RsR] Median of streaming data
Matias Salibian-Barrera
matias at stat.ubc.ca
Wed Sep 24 23:15:13 CEST 2014
Martin,
There's also the work of a former PhD student in our Dept:
http://arxiv.org/pdf/1007.1032.pdf
Matias
On 24/09/2014 1:16 AM, Martin Maechler wrote:
>>>>>> Rolf Turner <r.turner at auckland.ac.nz>
>>>>>> on Wed, 24 Sep 2014 18:43:34 +1200 writes:
> > On 24/09/14 17:31, Mohan Radhakrishnan wrote:
> >> Hi,
> >>
> >> I have streaming data(1 TB) that can't fit in memory. Is
> >> there a way for me to find the median of these streaming
> >> integers assuming I can fit only a small part in memory ?
> >> This is about the statistical approach to find the median
> >> of a large number of values when I can inspect only a
> >> part of them due to memory constraints.
>
> > You cannot, I'm pretty sure, calculate the median
> > recursively. However there are "approximate" recursive
> > median algorithms which provide an estimate of location
> > that has the same asymptotic properties as the median.
>
> > See:
>
> > * U. Holst, Recursive estimators of location.
> > Commun. Statist. Theory Meth., vol. 16, 1987,
> > pp. 2201--2226.
>
> > and
>
> > * Murray A. Cameron and T. Rolf Turner, Recursive location
> > and scale estimators, Commun. Statist. Theory Meth.,
> > vol. 22, 1993, pp. 2503--2515.
>
> This is really interesting to me, thank you, Rolf!
>
> OTOH,
>
> 1) has your proposal ever been provided in R?
> I'd be happy to add it to the robustX
> (http://cran.ch.r-project.org/web/packages/robustX) or even
> robustbase (http://cran.ch.r-project.org/web/packages/robustbase) package.
>
> 2) Would anybody know of more recent research on the subject?
> (I quickly "googled around" and found research more geared
> for the time series situation which is more involved anyway)
>
> --> Hence CC'ing the experts' list R-SIG-robust
>
>
> Martin Maechler, ETH Zurich
>
>
> > cheers,
> > Rolf Turner
>
> > --
> > Rolf Turner Technical Editor ANZJS
>
> _______________________________________________
> R-SIG-Robust at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-robust
More information about the R-help
mailing list