# [R] median() for ordered factor {was "what does this mean .."}

Martin Maechler maechler at stat.math.ethz.ch
Mon Nov 24 11:58:54 CET 2003

```>>>>> "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk>
>>>>>     on 21 Nov 2003 15:08:09 +0100 writes:

PD> "Liaw, Andy" <andy_liaw at merck.com> writes:
>> > From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]
>> >
>> > John Christie <jc at or.psychology.dal.ca> writes:
>> >
>> > > what does this mean in R-1.8.1 release notes?
>> > >
>> > > o median() no longer `works' for odd-length > factor
>> variables.
>> >
>> > The median has always been undefined for factors, but
>> nevertheless > median() gave an answer. If the length was
>> even, it would > fail since it needed to average
>> non-numeric values. This > confused some and the answer
>> you got for in the odd-length > case was meaningless
>> anyway (what's the median of three > pears, four apples,
>> and two bananas?). So now we check.
>>
>> Why not just give an error if median is given an
>> unordered factor?

PD> That's what we do and didn't:

PD>     if (is.factor(x) || mode(x) != "numeric")
PD>            stop("need numeric data")

PD> (also for ordered factors; it is not clear what to do if
PD> the median sits between two levels in that case either.)

Actually, our  mad() function has  arguments  low & high
(for partial S-plus compatibility) to ask for the
lo-median or hi-median respectively.  These only differ from the
median in the case of even  n := length(x), and
for ox := sort(x)  give  ox[ n/2 ] or ox[n/2 + 1] respectively.

Hence, for ordered factors, the lo- and hi-median would be well
defined, and I have in the past considered propagating the 'low'
and 'high' arguments from mad() to median().

Martin

```