[R] Median of streaming data
Rolf Turner
r.turner at auckland.ac.nz
Sat Sep 27 00:53:08 CEST 2014
On 26/09/14 21:48, Martin Maechler wrote:
>>>>>> Rolf Turner <r.turner at auckland.ac.nz>
<SNIP>
>
> > I have coded up the algorithm from the Cameron and Turner
> > paper. Dunno if it gives exactly the same results as my
> > (Splus?) code from lo these many years ago (the code that
> > is lost in the mists of time), but it *seems* to work.
>
> excellent, thank you, Rolf!
>
> > It is not designed to work with actual "streaming" data
> > --- I don't know how to do that. It takes a complete data
> > vector as input. Someone who knows about streaming data
> > should be able to adapt it pretty easily. Said he, the
> > proverbial optimist.
>
> I agree; that should not be hard.
> One way is to replace 'y[ind]' by 'getY(ind)' everywhere in the code
> and let 'getY' be an argument to rlas() provided by the user.
>
> > The function code and a help file are attached. These
> > files have had their names changed to end in ".txt" so
> > that they will get through the mailing list processor
> > without being stripped. With a bit of luck.
> ;-)
>
> It did work indeed.
> I've added them to 'robustX' -- on R-forge,
> including a plot() method and some little more flexibility.
>
> --> https://r-forge.r-project.org/R/?group_id=59
>
<SNIP>
Since I posted my previous email I have found some typos in the
documentation and made some adjustments to the code.
I also realized that the name "rlas" sounds a bit too much like a random
number generator, according to R's conventions. So I have decided that
"lasr" would be a better name.
I hope this change of horses in midstream doesn't mess things up too
much for you. If it does, of course feel free to stick with "rlas".
I have attached revised *.R and *.Rd files. Not sure that the *.Rd file
will get through as-is. Please let me know if it doesn't.
cheers,
Rolf
--
Rolf Turner
Technical Editor ANZJS
-------------- next part --------------
lasr <- function(y,b=0.2,mfn=function(n){0.1*n^(-0.25)},
nstart=30,scon=NULL) {
# Initialize:
y0 <- y[1:nstart]
alpha <- median(y0)
s <- if(is.null(scon)) mean(abs(y0-alpha)) else scon
mu <- mfn(nstart)/s
eps <- s/nstart^b
kount <- sum(abs(alpha-y0) < eps)
g <- kount/(eps*nstart)
ny <- length(y)
n <- nstart+1
locn <- numeric(ny)
locn[1:(nstart-1)] <- NA
locn[nstart] <- alpha
scale <- numeric(ny)
scale[1:(nstart-1)] <- if(is.null(scon)) NA else scon
scale[nstart] <- s
# Calculate recursively:
while(n <= ny) {
s <- if(is.null(scon)) ((n-1)*s + abs(y[n]-alpha))/n else scon
mu <- mfn(n)/s
eye <- if(abs(alpha - y[n]) < s/n^b) 1 else 0
g <- (1-1/n)*g + n^(b-1)*eye/s
a <- max(mu,g)
alpha <- alpha + sign(y[n]-alpha)/(a*n)
locn[n] <- alpha
scale[n] <- s
n <- n+1
}
list(locn=locn,scale=scale)
}
-------------- next part --------------
\name{lasr}
\alias{lasr}
\title{
Recursive location and scale.
}
\description{
Calculate an estimate of location, asymptotically
equivalent to the median, and an estimate of scale
equal to the \bold{MEAN} absolute deviation. Both
done recursively.
}
\usage{
lasr(y, b = 0.2, mfn = function(n){0.1 * n^(-0.25)},
nstart = 30, scon = NULL)
}
\arguments{
\item{y}{
A numeric vector of i.i.d. data whose location and scale
parameters are to be estimated.
}
\item{b}{
A tuning parameter (default value equal to that used by
Holst, 1987).
}
\item{mfn}{
Function of the index of the data which must be positive
and tend to 0 as the index tends to infinity. The default
function is that used by Holst, 1987.
}
\item{nstart}{
Starting values for the algorithm are formed from the first
\code{nstart} values of \code{y}. The default value is that
used in Cameron and Turner, 1993.
}
\item{scon}{
A constant value for the scale parameter \code{s}. If this
is provided then the algorithm becomes equivalent to the
algorithm of Holst, 1987.
}
}
\value{
A list with entries
\item{locn}{The successive recursive estimates of location. The
first \code{nstart - 1} of these are \code{NA}.}
\item{scale}{The successive recursive estimates of scale. If
\code{scon} is specified these are all equal to \code{scon}. Otherwise
the first \code{nstart - 1} values are \code{NA}.}
}
\section{Remark}{
The \bold{mean} absolute deviation is used as an estimate of scale
(rather than the more expected \bold{median} absolute deviation) simply
because the former can be calculated recursively.
}
\references{
Cameron, Murray A. and Turner, T. Rolf (1993). Recursive location and
scale estimators. \emph{Commun. Statist. --- Theory Meth.} \bold{22}
(9) 2503--2515.
Holst, U. (1987). Recursive estimators of location.
\emph{Commun. Statist. --- Theory Meth.} \bold{16} (8) 2201--2226.
}
\author{
\email{r.turner at auckland.ac.nz}
\url{http://www.stat.auckland.ac.nz/~rolf}
}
\examples{
set.seed(42)
y <- rnorm(10000)
z1 <- lasr(y)
z2 <- lasr(y,scon=1)
z3 <- lasr(y,scon=100)
OP <- par(mfrow=c(3,1))
plot(z1$locn,type="l")
abline(h=median(y),col="red")
plot(z2$locn,type="l")
abline(h=median(y),col="red")
plot(z3$locn,type="l")
abline(h=median(y),col="red")
par(OP)
}
\keyword{ univar }
\keyword{ robust }
More information about the R-help
mailing list