[R] Weighted median
Markus Jantti
markus.jantti at iki.fi
Wed Feb 6 23:34:32 CET 2002
> Is there a weighted median function out there similar to weighted.mean()
> but for medians? If not, I'll try implement or port it myself.
>
Hi -- I have written the following functions fo my own use but am unhappy
with them. I first define a weighted.quantile function and then weighted.median
that uses the first one.
The reason I am unhappy with my version is that
I wanted to define a weighted.quantile function that reproduced the
R quantile function for the unweighted case (R:s quantile does some
smart interpolating), but I did not manage to do so. The below version
works, but if you can write a weighted version of the quantile function, let me
know.
Markus
# weighted quantiles. various versions, none good.
weighted.quantile <-
function(x, w, probs = seq(0, 1, 0.25), na.rm = FALSE, names = TRUE) {
# n <- length(x)
# check:
# - if w exists (length(w)>0) => if not, then w <- rep(1,length(x))
# - if w is the weight or the inclusion probability
# + a test sum(w) > length(x)??????
if (length(x) != length(w))
stop("Weights and variable vectors are of unequal length!")
if (na.rm)
{ valid.obs <- !is.na(x) & !is.na(w)
x <- x[valid.obs]
w <- w[valid.obs]}
else if (any(is.na(x)) | any(is.na(w)))
stop("Missing values and NaN's not allowed if `na.rm' is FALSE")
if (any((p.ok <- !is.na(probs)) & (probs < 0 | probs > 1)))
stop("probs outside [0,1]")
if (na.p <- any(!p.ok)) {
o.pr <- probs
probs <- probs[p.ok]
}
np <- length(probs)
if (missing(w))
w <- rep(1, length(x))
if (na.rm) {
w <- w[i <- !is.na(x)]
x <- x[i]
}
# check if weights are probs or unit correspondences
ind <- order(x)
cumw <- cumsum(w[ind])/sum(w)
x <- x[ind]
med <- numeric(0)
for(i in 1:length(probs)) {
if(probs[i] > 0 & probs[i] < 1) med[i] <- max(x[cumw<=probs[i]])
else if (probs[i] == 0) med[i] <- min(x)
else if (probs[i] == 1) med[i] <- max(x)
}
# the treatment of names
med
}
weighted.median <- function(x,w, na.rm = FALSE) {
if (missing(w))
w <- rep(1, length(x))
if (na.rm) {
w <- w[i <- !is.na(x)]
x <- x[i]
}
weighted.quantile(x,w,.5)
}
#
> The need for a weighted median came from the following optimization
> problem:
>
> x* = arg_x min (a|x| + sum_{k=1}^n |x - b_k|)
>
> where
>
> a : is a *positive* real scalar
> x : is a real scalar
> n : is an integer
> b_k: are negative and positive scalars
>
> which is a "median with a shift towards zero". This can be rewritten as
>
> x* = arg_x min (sum_{k=0}^n w_k|x - b_k|)
>
> where w = (a, 1, 1, ..., 1) and b_0 = 0, which is a problem that could be
> solved by a weighted median function. If someone has a better solution
> (note that 'a' does *not* have to be an integer) to this specific
> optimization problem I would be very happy to hear about it.
>
> Thanks
>
> Henrik Bengtsson
>
> Dept. of Mathematical Statistics @ Centre for Mathematical Sciences
> Lund Institute of Technology/Lund University, Sweden (+2h UTC)
> Office: P316, +46 46 222 9611 (phone), +46 46 222 4623 (fax)
> h b @ m a t h s . l t h . s e
> http://www.maths.lth.se/matstat/staff/hb/
>
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
--
Markus Jantti
Statistics Finland and University of Tampere
markus.jantti at iki.fi
http://www.iki.fi/~mjantti
More information about the R-help
mailing list