[Rd] Inconsistency in median()

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Tue May 4 17:57:05 CEST 2021


>>>>> Gustavo Zapata Wainberg 
>>>>>     on Mon, 3 May 2021 20:48:49 +0200 writes:

    > Hi!

    > I'm wrinting this post because there is an inconsistency
    > when median() is calculated for even or odd vectors. For
    > odd vectors, attributes (such as labels added with Hmisc)
    > are kept after running median(), but this is not the case
    > if the vector is even, in this last case attributes are
    > lost.

    > I know that this is due to median() using mean() to obtain
    > the result when the vector is even, and mean() always
    > takes attributes off vectors.

Yes, and this has been the design of  median()  for ever :

If n := length(x)  is odd,  the median is "the middle" observation,
                   and should  equal to x[j] for j = (n+1)/2
		   and hence e.g., is well defined for an ordered factor.

When  n  is even
     however, median() must be the mean of "the two middle" observations,
       which is e.g., not even *defined* for an ordered factor.

We *could* talk of the so called lo-median  or hi-median
(terms probably coined by John W. Tukey) because (IIRC), these
are equal to each other and to the median for odd n, but
are   equal to  x[j]  and  x[j+1]   j=n/2  for even n *and* are
still "of the same kind" as x[]  itself.

Interestingly, for the mad() { = the median absolute deviation from the median}
we *do* allow to specify logical 'low' and 'high',
but that for the "outer" median in MAD's definition, not the
inner one.

## From <Rsrc>/src/library/stats/R/mad.R :

mad <- function(x, center = median(x), constant = 1.4826,
                na.rm = FALSE, low = FALSE, high = FALSE)
{
    if(na.rm)
	x <- x[!is.na(x)]
    n <- length(x)
    constant *
        if((low || high) && n%%2 == 0) {
            if(low && high) stop("'low' and 'high' cannot be both TRUE")
            n2 <- n %/% 2 + as.integer(high)
            sort(abs(x - center), partial = n2)[n2]
        }
        else median(abs(x - center))
}




    > Don't you think that attributes should be kept in both
    > cases? 

well, not all attributes can be kept.
Note that for *named* vectors x,  x[j] can (and does) keep the name,
but there's definitely no sensible name to give to (x[j] + x[j+1])/2

I'm willing to collaborate with some, considering
to extend  median.default()  making  hi-median and lo-median
available to the user.
Both of these will always return x[j] for some j and hence keep
all (sensible!) attributes (well, if the `[`-method for the
corresponding class has been defined correctly; I've encountered
quite a few cases where people created vector-like classes but
did not provide a "correct"  subsetting method (typically you
should make sure both a `[[` and `[` method works!).

Best regards,
Martin

Martin Maechler
ETH Zurich  and  R Core team

    > And, going further, shouldn't mean() keep
    > attributes as well? I have looked in R's Bugzilla and I
    > didn't find an entry related to this issue.

    > Please, let me know if you consider that this issue should
    > be posted in R's bugzilla.

    > Here is an example with code.

    > rndvar <- rnorm(n = 100)

    > Hmisc::label(rndvar) <- "A label for RNDVAR"

    > str(median(rndvar[-c(1,2)]))

    > Returns: "num 0.0368"

    > str(median(rndvar[-1]))

    > Returns: 'labelled' num 0.0322 - attr(*, "label")= chr "A
    > label for RNDVAR"

    > Thanks in advance!

    > Gustavo Zapata-Wainberg

    > 	[[alternative HTML version deleted]]

    > ______________________________________________
    > R-devel using r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list