[Rd] Inconsistency in median()
Gustavo Zapata Wainberg
Wed May 5 16:28:17 CEST 2021
Hi, thanks Dr. Mächler for your prompt response!
I agree with your explanations about this issue. But I was thinking of
something like adding an argument to median() and mean() that could keep
the attributes of the variables if set to TRUE.
Thanks again.
Best regards
El mar, 4 may 2021 a las 17:57, Martin Maechler
escribió:
Gustavo Zapata Wainberg
on Mon, 3 May 2021 20:48:49 +0200 writes:
>
Hi!
>
I'm wrinting this post because there is an inconsistency
when median() is calculated for even or odd vectors. For
odd vectors, attributes (such as labels added with Hmisc)
are kept after running median(), but this is not the case
if the vector is even, in this last case attributes are
lost.
>
I know that this is due to median() using mean() to obtain
the result when the vector is even, and mean() always
takes attributes off vectors.
>
> Yes, and this has been the design of median() for ever :
>
> If n := length(x) is odd, the median is "the middle" observation,
and should equal to x[j] for j = (n+1)/2
and hence e.g., is well defined for an ordered factor.

When n is even
>
> When n is even
however, median() must be the mean of "the two middle" observations,
> which is e.g., not even *defined* for an ordered factor.
>
> We *could* talk of the so called lo-median or hi-median
(terms probably coined by John W. Tukey) because (IIRC), these
are equal to each other and to the median for odd n, but
are equal to x[j] and x[j+1] j=n/2 for even n *and* are
> still "of the same kind" as x[] itself.
>
> Interestingly, for the mad() { = the median absolute deviation from the
median}
we *do* allow to specify logical 'low' and 'high',
but that for the "outer" median in MAD's definition, not the
> inner one.
>
> ## From <Rsrc>/src/library/stats/R/mad.R :
>
> mad <- function(x, center = median(x), constant = 1.4826,
> na.rm = FALSE, low = FALSE, high = FALSE)
> {
> if(na.rm)
> x <- x[!is.na(x)]
> n <- length(x)
> constant *
> if((low || high) && n%%2 == 0) {
> if(low && high) stop("'low' and 'high' cannot be both TRUE")
> n2 <- n %/% 2 + as.integer(high)
> sort(abs(x - center), partial = n2)[n2]
> }
> else median(abs(x - center))
> }
>
>
>
>
Don't you think that attributes should be kept in both
> > cases?
>
> well, not all attributes can be kept.
Note that for *named* vectors x, x[j] can (and does) keep the name,
> but there's definitely no sensible name to give to (x[j] + x[j+1])/2
>
> I'm willing to collaborate with some, considering
to extend median.default() making hi-median and lo-median
available to the user.
Both of these will always return x[j] for some j and hence keep
all (sensible!) attributes (well, if the `[`-method for the
corresponding class has been defined correctly; I've encountered
quite a few cases where people created vector-like classes but
did not provide a "correct" subsetting method (typically you
should make sure both a `[[` and `[` method works!).

Best regards,
>
> Best regards,
> Martin
>
> Martin Maechler
ETH Zurich and R Core team
>
And, going further, shouldn't mean() keep
attributes as well? I have looked in R's Bugzilla and I
> > didn't find an entry related to this issue.
>
> > Please, let me know if you consider that this issue should
> > be posted in R's bugzilla.
>
> > Here is an example with code.
>
> > rndvar <- rnorm(n = 100)
>
> > Hmisc::label(rndvar) <- "A label for RNDVAR"
>
> > str(median(rndvar[-c(1,2)]))
>
> > Returns: "num 0.0368"
>
> > str(median(rndvar[-1]))
>
> > Returns: 'labelled' num 0.0322 - attr(*, "label")= chr "A
> > label for RNDVAR"
>
> > Thanks in advance!
>
> > Gustavo Zapata-Wainberg
>
>
