[R] Weighted median

Markus Jantti markus.jantti at iki.fi
Wed Feb 6 23:34:32 CET 2002


> Is there a weighted median function out there similar to weighted.mean()
> but for medians? If not, I'll try implement or port it myself.
> 

Hi -- I have written the following functions fo my own use but am unhappy
with them. I first define a weighted.quantile function and then weighted.median
that uses the first one. 

The reason I am unhappy with my version is that 
I wanted to define a weighted.quantile function that reproduced the
R quantile function for the unweighted case (R:s quantile does some
smart interpolating), but I did not manage to do so. The below version
works, but if you can write a weighted version of the quantile function, let me
know.

Markus


# weighted quantiles. various versions, none good.

weighted.quantile <-
  function(x, w, probs = seq(0, 1, 0.25), na.rm = FALSE, names = TRUE) {
    # n <- length(x)
    # check:
    # - if w exists (length(w)>0) => if not, then w <- rep(1,length(x))
    # - if w is the weight or the inclusion probability
    #   + a test sum(w) > length(x)??????
    if (length(x) != length(w))
      stop("Weights and variable vectors are of unequal length!")
    if (na.rm)
      { valid.obs <- !is.na(x) & !is.na(w)
        x <- x[valid.obs]
        w <- w[valid.obs]}
    else if (any(is.na(x)) | any(is.na(w))) 
      stop("Missing values and NaN's not allowed if `na.rm' is FALSE")
    if (any((p.ok <- !is.na(probs)) & (probs < 0 | probs > 1))) 
      stop("probs outside [0,1]")
    if (na.p <- any(!p.ok)) {
      o.pr <- probs
      probs <- probs[p.ok]
    }
    np <- length(probs)
    if (missing(w)) 
      w <- rep(1, length(x))
    if (na.rm) {
      w <- w[i <- !is.na(x)]
      x <- x[i]
    }
    # check if weights are probs or unit correspondences
    ind <- order(x)
    cumw <- cumsum(w[ind])/sum(w)
    x <- x[ind]
    med <- numeric(0)
    for(i in 1:length(probs)) {
      if(probs[i] > 0 & probs[i] < 1) med[i] <- max(x[cumw<=probs[i]])
      else if (probs[i] == 0) med[i] <- min(x)
      else if (probs[i] == 1) med[i] <- max(x)
    }
    # the treatment of names
    med
  }

weighted.median <- function(x,w, na.rm = FALSE) {
  if (missing(w)) 
    w <- rep(1, length(x))
  if (na.rm) {
    w <- w[i <- !is.na(x)]
    x <- x[i]
    }
  weighted.quantile(x,w,.5)
}
#

> The need for a weighted median came from the following optimization
> problem:
> 
>   x* = arg_x min (a|x| + sum_{k=1}^n |x - b_k|)
> 
> where
> 
>   a  : is a *positive* real scalar
>   x  : is a real scalar
>   n  : is an integer
>   b_k: are negative and positive scalars
> 
> which is a "median with a shift towards zero". This can be rewritten as
> 
>   x* = arg_x min (sum_{k=0}^n w_k|x - b_k|)
> 
> where w = (a, 1, 1, ..., 1) and b_0 = 0, which is a problem that could be
> solved by a weighted median function. If someone has a better solution
> (note that 'a' does *not* have to be an integer) to this specific
> optimization problem I would be very happy to hear about it.
> 
> Thanks
> 
> Henrik Bengtsson
> 
> Dept. of Mathematical Statistics @ Centre for Mathematical Sciences
> Lund Institute of Technology/Lund University, Sweden (+2h UTC)
> Office: P316, +46 46 222 9611 (phone), +46 46 222 4623 (fax)
> h b @ m a t h s . l t h . s e
> http://www.maths.lth.se/matstat/staff/hb/
> 
> 
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._


--
Markus Jantti
Statistics Finland and University of Tampere
markus.jantti at iki.fi
http://www.iki.fi/~mjantti




More information about the R-help mailing list