[Rd] median and data frames

Joshua Ulrich josh.m.ulrich at gmail.com
Thu May 5 20:54:10 CEST 2011


On Fri, Apr 29, 2011 at 9:25 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
>>>>>> Paul Johnson <pauljohn32 at gmail.com>
>>>>>>     on Thu, 28 Apr 2011 00:20:27 -0500 writes:
>
>    > On Wed, Apr 27, 2011 at 12:44 PM, Patrick Burns
>    > <pburns at pburns.seanet.com> wrote:
>    >> Here are some data frames:
>    >>
>    >> df3.2 <- data.frame(1:3, 7:9)
>    >> df4.2 <- data.frame(1:4, 7:10)
>    >> df3.3 <- data.frame(1:3, 7:9, 10:12)
>    >> df4.3 <- data.frame(1:4, 7:10, 10:13)
>    >> df3.4 <- data.frame(1:3, 7:9, 10:12, 15:17)
>    >> df4.4 <- data.frame(1:4, 7:10, 10:13, 15:18)
>    >>
>    >> Now here are some commands and their answers:
>
>    >>> median(df4.4)
>    >> [1]  8.5 11.5
>    >>> median(df3.2[c(1,2,3),])
>    >> [1] 2 8
>    >>> median(df3.2[c(1,3,2),])
>    >> [1]  2 NA
>    >> Warning message:
>    >> In mean.default(X[[2L]], ...) :
>    >>  argument is not numeric or logical: returning NA
>    >>
>    >>
>    >>
>    >> The sessionInfo is below, but it looks
>    >> to me like the present behavior started
>    >> in 2.10.0.
>    >>
>    >> Sometimes it gets the right answer.  I'd
>    >> be grateful to hear how it does that -- I
>    >> can't figure it out.
>    >>
>
>    > Hello, Pat.
>
>    > Nice poetry there!  I think I have an actual answer, as opposed to the
>    > usual crap I spew.
>
>    > I would agree if you said median.data.frame ought to be written to
>    > work columnwise, similar to mean.data.frame.
>
>    > apply and sapply  always give the correct answer
>
>    >> apply(df3.3, 2, median)
>    > X1.3   X7.9 X10.12
>    > 2      8     11
>
>    [...........]
>
> exactly
>
>    > mean.data.frame is now implemented as
>
>    > mean.data.frame <- function(x, ...) sapply(x, mean, ...)
>
> exactly.
>
> My personal oppinion is that  mean.data.frame() should never have
> been written.
> People should know, or learn, to use apply functions for such a
> task.
>
> The unfortunate fact that mean.data.frame() exists makes people
> think that median.data.frame() should too,
> and then
>
>  var.data.frame()
>   sd.data.frame()
>  mad.data.frame()
>  min.data.frame()
>  max.data.frame()
>  ...
>  ...
>
> all just in order to *not* to have to know  sapply()
> ????
>
> No, rather not.
>
> My vote is for deprecating  mean.data.frame().
>
> Martin
>

I agree.  However, sd() isn't currently (as of R-2.13.0) generic and
it operates by column for matrix and data.frame objects, so it behaves
a bit more like mean() and is similarly inconsistent from the other
listed functions.  I have no input on how this should be handled, but
thought it may be worth addressing.

Best,
--
Joshua Ulrich  |  FOSS Trading: www.fosstrading.com



More information about the R-devel mailing list