[Rd] median and data frames
Martin Maechler
maechler at stat.math.ethz.ch
Sat Oct 8 20:15:38 CEST 2011
>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>> on Fri, 29 Apr 2011 16:25:09 +0200 writes:
>>>>> Paul Johnson <pauljohn32 at gmail.com>
>>>>> on Thu, 28 Apr 2011 00:20:27 -0500 writes:
>> On Wed, Apr 27, 2011 at 12:44 PM, Patrick Burns
>> <pburns at pburns.seanet.com> wrote:
>>> Here are some data frames:
>>>
>>> df3.2 <- data.frame(1:3, 7:9) df4.2 <- data.frame(1:4,
>>> 7:10) df3.3 <- data.frame(1:3, 7:9, 10:12) df4.3 <-
>>> data.frame(1:4, 7:10, 10:13) df3.4 <- data.frame(1:3,
>>> 7:9, 10:12, 15:17) df4.4 <- data.frame(1:4, 7:10, 10:13,
>>> 15:18)
>>>
>>> Now here are some commands and their answers:
>>>> median(df4.4)
>>> [1] 8.5 11.5
>>>> median(df3.2[c(1,2,3),])
>>> [1] 2 8
>>>> median(df3.2[c(1,3,2),])
>>> [1] 2 NA Warning message: In mean.default(X[[2L]], ...)
>>> : argument is not numeric or logical: returning NA
>>>
>>>
>>>
>>> The sessionInfo is below, but it looks to me like the
>>> present behavior started in 2.10.0.
>>>
>>> Sometimes it gets the right answer. I'd be grateful to
>>> hear how it does that -- I can't figure it out.
>>>
> Hello, Pat.
>> Nice poetry there! I think I have an actual answer, as
>> opposed to the usual crap I spew.
>> I would agree if you said median.data.frame ought to be
>> written to work columnwise, similar to mean.data.frame.
>> apply and sapply always give the correct answer
>>> apply(df3.3, 2, median)
>> X1.3 X7.9 X10.12 2 8 11
> [...........]
> exactly
>> mean.data.frame is now implemented as
>> mean.data.frame <- function(x, ...) sapply(x, mean, ...)
> exactly.
> My personal oppinion is that mean.data.frame() should
> never have been written. People should know, or learn, to
> use apply functions for such a task.
> The unfortunate fact that mean.data.frame() exists makes
> people think that median.data.frame() should too, and then
> var.data.frame() sd.data.frame() mad.data.frame()
> min.data.frame() max.data.frame() ... ...
> all just in order to *not* to have to know sapply() ????
> No, rather not.
> My vote is for deprecating mean.data.frame().
> Martin
This has now happened -- for R 2.14.0 and later.
As raised in this thread in April, there's a similar
"extra helpful" behavior within the sd() function,
and we've also deprecated that.
In addition -- getting back to Pat Burns' original post,
I'm also proposing to change median(<data.frame>)
such that it produces an error instead of the current "sometimes
correct" (but mostly not!) results.
Martin
More information about the R-devel
mailing list