[R] Median of streaming data
Martyn Byng
martyn.byng at nag.co.uk
Wed Sep 24 12:29:33 CEST 2014
Something else that might be of interest ...
Zhang Q and Wang W (2007) A fast algorithm for approximate quantiles in high speed data streams Proceedings of the 19th International Conference on Scientific and Statistical Database Management IEEE Computer Society 29
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Martin Maechler
Sent: 24 September 2014 09:17
To: Rolf Turner
Cc: R-help at r-project.org; R-SIG-robust at r-project.org
Subject: Re: [R] Median of streaming data
>>>>> Rolf Turner <r.turner at auckland.ac.nz>
>>>>> on Wed, 24 Sep 2014 18:43:34 +1200 writes:
> On 24/09/14 17:31, Mohan Radhakrishnan wrote:
>> Hi,
>>
>> I have streaming data(1 TB) that can't fit in memory. Is
>> there a way for me to find the median of these streaming
>> integers assuming I can fit only a small part in memory ?
>> This is about the statistical approach to find the median
>> of a large number of values when I can inspect only a
>> part of them due to memory constraints.
> You cannot, I'm pretty sure, calculate the median
> recursively. However there are "approximate" recursive
> median algorithms which provide an estimate of location
> that has the same asymptotic properties as the median.
> See:
> * U. Holst, Recursive estimators of location.
> Commun. Statist. Theory Meth., vol. 16, 1987,
> pp. 2201--2226.
> and
> * Murray A. Cameron and T. Rolf Turner, Recursive location
> and scale estimators, Commun. Statist. Theory Meth.,
> vol. 22, 1993, pp. 2503--2515.
This is really interesting to me, thank you, Rolf!
OTOH,
1) has your proposal ever been provided in R?
I'd be happy to add it to the robustX
(http://cran.ch.r-project.org/web/packages/robustX) or even
robustbase (http://cran.ch.r-project.org/web/packages/robustbase) package.
2) Would anybody know of more recent research on the subject?
(I quickly "googled around" and found research more geared
for the time series situation which is more involved anyway)
--> Hence CC'ing the experts' list R-SIG-robust
Martin Maechler, ETH Zurich
> cheers,
> Rolf Turner
> --
> Rolf Turner Technical Editor ANZJS
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
________________________________________________________________________
This e-mail has been scanned for all viruses by Star.\ _...{{dropped:3}}
More information about the R-help
mailing list