[R] Median of streaming data
Martin Maechler
maechler at stat.math.ethz.ch
Fri Sep 26 11:48:49 CEST 2014
>>>>> Rolf Turner <r.turner at auckland.ac.nz>
>>>>> on Thu, 25 Sep 2014 11:44:38 +1200 writes:
> On 24/09/14 20:16, Martin Maechler wrote: <SNIP>
>> 1) has your proposal ever been provided in R? I'd be
>> happy to add it to the robustX
>> (http://cran.ch.r-project.org/web/packages/robustX) or
>> even robustbase
>> (http://cran.ch.r-project.org/web/packages/robustbase)
>> package.
> <SNIP>
> I have coded up the algorithm from the Cameron and Turner
> paper. Dunno if it gives exactly the same results as my
> (Splus?) code from lo these many years ago (the code that
> is lost in the mists of time), but it *seems* to work.
excellent, thank you, Rolf!
> It is not designed to work with actual "streaming" data
> --- I don't know how to do that. It takes a complete data
> vector as input. Someone who knows about streaming data
> should be able to adapt it pretty easily. Said he, the
> proverbial optimist.
I agree; that should not be hard.
One way is to replace 'y[ind]' by 'getY(ind)' everywhere in the code
and let 'getY' be an argument to rlas() provided by the user.
> The function code and a help file are attached. These
> files have had their names changed to end in ".txt" so
> that they will get through the mailing list processor
> without being stripped. With a bit of luck.
;-)
It did work indeed.
I've added them to 'robustX' -- on R-forge,
including a plot() method and some little more flexibility.
--> https://r-forge.r-project.org/R/?group_id=59
Thank you for all the other pointers to litterature (but none to
software), some of which quite recent.
One old idea that was not directly mentioned I think is the
"Remedian" of Rouseeuw and Basset:
Peter J. Rousseeuw and Gilbert W. Bassett, Jr. (1998)
The Remedian: A Robust Averaging Method for Large Data Sets
Journal of the American Statistical Association, Vol. 85, No. 409, pp. 97-104
[URL: http://www.jstor.org/stable/2289530]
which is also easy to implement and I plan to add to robustbase
(as I'd want to use the C code already in robustbase) as a
"reference" estimator.
Personally, I think there is quite some room for research and
implementation, not the least because the litterature seems to
always be a bit incomplete {one "school" not knowning about, or
at least not citing works of the other "school", etc...}
Martin
--
Martin Maechler, ETH Zurich
> If they *don't* get through, anyone who is interested
> should contact me and I will send them to you "privately".
> cheers,
> Rolf
> --
> Rolf Turner Technical Editor ANZJS
>
> external: rlas.R, plain text]
> external: rlas.Rd, plain text]
More information about the R-help
mailing list