[R] median() for ordered factor {was "what does this mean .."}
Martin Maechler
maechler at stat.math.ethz.ch
Mon Nov 24 11:58:54 CET 2003
>>>>> "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk>
>>>>> on 21 Nov 2003 15:08:09 +0100 writes:
PD> "Liaw, Andy" <andy_liaw at merck.com> writes:
>> > From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]
>> >
>> > John Christie <jc at or.psychology.dal.ca> writes:
>> >
>> > > what does this mean in R-1.8.1 release notes?
>> > >
>> > > o median() no longer `works' for odd-length > factor
>> variables.
>> >
>> > The median has always been undefined for factors, but
>> nevertheless > median() gave an answer. If the length was
>> even, it would > fail since it needed to average
>> non-numeric values. This > confused some and the answer
>> you got for in the odd-length > case was meaningless
>> anyway (what's the median of three > pears, four apples,
>> and two bananas?). So now we check.
>>
>> Why not just give an error if median is given an
>> unordered factor?
PD> That's what we do and didn't:
PD> if (is.factor(x) || mode(x) != "numeric")
PD> stop("need numeric data")
PD> (also for ordered factors; it is not clear what to do if
PD> the median sits between two levels in that case either.)
Actually, our mad() function has arguments low & high
(for partial S-plus compatibility) to ask for the
lo-median or hi-median respectively. These only differ from the
median in the case of even n := length(x), and
for ox := sort(x) give ox[ n/2 ] or ox[n/2 + 1] respectively.
Hence, for ordered factors, the lo- and hi-median would be well
defined, and I have in the past considered propagating the 'low'
and 'high' arguments from mad() to median().
Martin
More information about the R-help
mailing list