# [R] Median of streaming data

Rolf Turner r.turner at auckland.ac.nz
Sat Sep 27 00:53:08 CEST 2014

On 26/09/14 21:48, Martin Maechler wrote:
>>>>>> Rolf Turner <r.turner at auckland.ac.nz>

<SNIP>

>
>      > I have coded up the algorithm from the Cameron and Turner
>      > paper.  Dunno if it gives exactly the same results as my
>      > (Splus?) code from lo these many years ago (the code that
>      > is lost in the mists of time), but it *seems* to work.
>
> excellent, thank you, Rolf!
>
>      > It is not designed to work with actual "streaming" data
>      > --- I don't know how to do that.  It takes a complete data
>      > vector as input.  Someone who knows about streaming data
>      > should be able to adapt it pretty easily.  Said he, the
>      > proverbial optimist.
>
> I agree; that should not be hard.
> One way is to replace   'y[ind]' by   'getY(ind)' everywhere in the code
> and let 'getY' be an argument to rlas() provided by the user.
>
>      > The function code and a help file are attached.  These
>      > files have had their names changed to end in ".txt" so
>      > that they will get through the mailing list processor
>      > without being stripped.  With a bit of luck.
> ;-)
>
> It did work indeed.
> I've added them to  'robustX' -- on R-forge,
> including a plot() method and some little more flexibility.
>
>    --> https://r-forge.r-project.org/R/?group_id=59
>

<SNIP>

Since I posted my previous email I have found some typos in the

I also realized that the name "rlas" sounds a bit too much like a random
number generator, according to R's conventions.  So I have decided that
"lasr" would be a better name.

I hope this change of horses in midstream doesn't mess things up too
much for you.  If it does, of course feel free to stick with "rlas".

I have attached revised *.R and *.Rd files.  Not sure that the *.Rd file
will get through as-is.  Please let me know if it doesn't.

cheers,

Rolf

--
Rolf Turner
Technical Editor ANZJS
-------------- next part --------------
lasr <- function(y,b=0.2,mfn=function(n){0.1*n^(-0.25)},
nstart=30,scon=NULL) {
# Initialize:
y0    <- y[1:nstart]
alpha <- median(y0)
s     <- if(is.null(scon)) mean(abs(y0-alpha)) else scon
mu    <- mfn(nstart)/s
eps   <- s/nstart^b
kount <- sum(abs(alpha-y0) < eps)
g     <- kount/(eps*nstart)
ny    <- length(y)
n     <- nstart+1
locn  <- numeric(ny)
locn[1:(nstart-1)] <- NA
locn[nstart] <- alpha
scale  <- numeric(ny)
scale[1:(nstart-1)] <- if(is.null(scon)) NA else scon
scale[nstart] <- s

# Calculate recursively:
while(n <= ny) {
s     <- if(is.null(scon)) ((n-1)*s + abs(y[n]-alpha))/n else scon
mu    <- mfn(n)/s
eye   <- if(abs(alpha - y[n]) < s/n^b) 1 else 0
g     <- (1-1/n)*g + n^(b-1)*eye/s
a     <- max(mu,g)
alpha <- alpha + sign(y[n]-alpha)/(a*n)
locn[n]  <- alpha
scale[n] <- s
n <- n+1
}
list(locn=locn,scale=scale)
}
-------------- next part --------------
\name{lasr}
\alias{lasr}
\title{
Recursive location and scale.
}
\description{
Calculate an estimate of location, asymptotically
equivalent to the median, and an estimate of scale
equal to the \bold{MEAN} absolute deviation.  Both
done recursively.
}
\usage{
lasr(y, b = 0.2, mfn = function(n){0.1 * n^(-0.25)},
nstart = 30, scon = NULL)
}
\arguments{
\item{y}{
A numeric vector of i.i.d. data whose location and scale
parameters are to be estimated.
}
\item{b}{
A tuning parameter (default value equal to that used by
Holst, 1987).
}
\item{mfn}{
Function of the index of the data which must be positive
and tend to 0 as the index tends to infinity.  The default
function is that used by Holst, 1987.
}
\item{nstart}{
Starting values for the algorithm are formed from the first
\code{nstart} values of \code{y}.  The default value is that
used in Cameron and Turner, 1993.
}
\item{scon}{
A constant value for the scale parameter \code{s}. If this
is provided then the algorithm becomes equivalent to the
algorithm of Holst, 1987.
}
}
\value{
A list with entries
\item{locn}{The successive recursive estimates of location.  The
first \code{nstart - 1} of these are \code{NA}.}
\item{scale}{The successive recursive estimates of scale. If
\code{scon} is specified these are all equal to \code{scon}. Otherwise
the first \code{nstart - 1} values are \code{NA}.}
}

\section{Remark}{
The \bold{mean} absolute deviation is used as an estimate of scale
(rather than the more expected \bold{median} absolute deviation) simply
because the former can be calculated recursively.
}

\references{
Cameron, Murray A. and Turner, T. Rolf (1993). Recursive location and
scale estimators. \emph{Commun. Statist. --- Theory Meth.} \bold{22}
(9) 2503--2515.

Holst, U. (1987). Recursive estimators of location.
\emph{Commun. Statist. --- Theory Meth.} \bold{16} (8) 2201--2226.
}
\author{
\email{r.turner at auckland.ac.nz}
\url{http://www.stat.auckland.ac.nz/~rolf}
}
\examples{
set.seed(42)
y  <- rnorm(10000)
z1 <- lasr(y)
z2 <- lasr(y,scon=1)
z3 <- lasr(y,scon=100)
OP <- par(mfrow=c(3,1))
plot(z1$locn,type="l") abline(h=median(y),col="red") plot(z2$locn,type="l")
abline(h=median(y),col="red")
plot(z3\$locn,type="l")
abline(h=median(y),col="red")
par(OP)
}

\keyword{ univar }
\keyword{ robust }