[Rd] median and data frames

Martin Maechler maechler at stat.math.ethz.ch
Fri Apr 29 16:25:09 CEST 2011


>>>>> Paul Johnson <pauljohn32 at gmail.com>
>>>>>     on Thu, 28 Apr 2011 00:20:27 -0500 writes:

    > On Wed, Apr 27, 2011 at 12:44 PM, Patrick Burns
    > <pburns at pburns.seanet.com> wrote:
    >> Here are some data frames:
    >> 
    >> df3.2 <- data.frame(1:3, 7:9)
    >> df4.2 <- data.frame(1:4, 7:10)
    >> df3.3 <- data.frame(1:3, 7:9, 10:12)
    >> df4.3 <- data.frame(1:4, 7:10, 10:13)
    >> df3.4 <- data.frame(1:3, 7:9, 10:12, 15:17)
    >> df4.4 <- data.frame(1:4, 7:10, 10:13, 15:18)
    >> 
    >> Now here are some commands and their answers:

    >>> median(df4.4)
    >> [1]  8.5 11.5
    >>> median(df3.2[c(1,2,3),])
    >> [1] 2 8
    >>> median(df3.2[c(1,3,2),])
    >> [1]  2 NA
    >> Warning message:
    >> In mean.default(X[[2L]], ...) :
    >>  argument is not numeric or logical: returning NA
    >> 
    >> 
    >> 
    >> The sessionInfo is below, but it looks
    >> to me like the present behavior started
    >> in 2.10.0.
    >> 
    >> Sometimes it gets the right answer.  I'd
    >> be grateful to hear how it does that -- I
    >> can't figure it out.
    >> 

    > Hello, Pat.

    > Nice poetry there!  I think I have an actual answer, as opposed to the
    > usual crap I spew.

    > I would agree if you said median.data.frame ought to be written to
    > work columnwise, similar to mean.data.frame.

    > apply and sapply  always give the correct answer

    >> apply(df3.3, 2, median)
    > X1.3   X7.9 X10.12
    > 2      8     11

    [...........]

exactly

    > mean.data.frame is now implemented as

    > mean.data.frame <- function(x, ...) sapply(x, mean, ...)

exactly.

My personal oppinion is that  mean.data.frame() should never have
been written.
People should know, or learn, to use apply functions for such a
task.

The unfortunate fact that mean.data.frame() exists makes people
think that median.data.frame() should too,
and then  

  var.data.frame()
   sd.data.frame()
  mad.data.frame()
  min.data.frame()
  max.data.frame()
  ...
  ...

all just in order to *not* to have to know  sapply() 
????

No, rather not.

My vote is for deprecating  mean.data.frame().

Martin



More information about the R-devel mailing list