[Rd] median and data frames

Tim Hesterberg timhesterberg at gmail.com
Sat Apr 30 17:19:31 CEST 2011


I also favor deprecating mean.data.frame.

One possible exception would be for a single-column data frame.
But even here I'd say no, lest people expect the same behavior for
median, var, ...

Pat's suggestion of using stop() would work nicely for mean.
(but omit paste - stop handles that).

Tim Hesterberg

>If Martin's proposal is accepted, does
>that mean that the median method for
>data frames would be something like:
>
>function (x, ...)
>{
>         stop(paste("you probably mean to use the command: sapply(",
>                 deparse(substitute(x)), ", median)", sep=""))
>}
>
>Pat
>
>
>On 29/04/2011 15:25, Martin Maechler wrote:
>>>>>>> Paul Johnson<pauljohn32 at gmail.com>
>>>>>>>      on Thu, 28 Apr 2011 00:20:27 -0500 writes:
>>
>>      >  On Wed, Apr 27, 2011 at 12:44 PM, Patrick Burns
>>      >  <pburns at pburns.seanet.com>  wrote:
>>      >>  Here are some data frames:
>>      >>
>>      >>  df3.2<- data.frame(1:3, 7:9)
>>      >>  df4.2<- data.frame(1:4, 7:10)
>>      >>  df3.3<- data.frame(1:3, 7:9, 10:12)
>>      >>  df4.3<- data.frame(1:4, 7:10, 10:13)
>>      >>  df3.4<- data.frame(1:3, 7:9, 10:12, 15:17)
>>      >>  df4.4<- data.frame(1:4, 7:10, 10:13, 15:18)
>>      >>
>>      >>  Now here are some commands and their answers:
>>
>>      >>>  median(df4.4)
>>      >>  [1]  8.5 11.5
>>      >>>  median(df3.2[c(1,2,3),])
>>      >>  [1] 2 8
>>      >>>  median(df3.2[c(1,3,2),])
>>      >>  [1]  2 NA
>>      >>  Warning message:
>>      >>  In mean.default(X[[2L]], ...) :
>>      >>    argument is not numeric or logical: returning NA
>>      >>
>>      >>
>>      >>
>>      >>  The sessionInfo is below, but it looks
>>      >>  to me like the present behavior started
>>      >>  in 2.10.0.
>>      >>
>>      >>  Sometimes it gets the right answer.  I'd
>>      >>  be grateful to hear how it does that -- I
>>      >>  can't figure it out.
>>      >>
>>
>>      >  Hello, Pat.
>>
>>      >  Nice poetry there!  I think I have an actual answer, as opposed to the
>>      >  usual crap I spew.
>>
>>      >  I would agree if you said median.data.frame ought to be written to
>>      >  work columnwise, similar to mean.data.frame.
>>
>>      >  apply and sapply  always give the correct answer
>>
>>      >>  apply(df3.3, 2, median)
>>      >  X1.3   X7.9 X10.12
>>      >  2      8     11
>>
>>      [...........]
>>
>> exactly
>>
>>      >  mean.data.frame is now implemented as
>>
>>      >  mean.data.frame<- function(x, ...) sapply(x, mean, ...)
>>
>> exactly.
>>
>> My personal oppinion is that  mean.data.frame() should never have
>> been written.
>> People should know, or learn, to use apply functions for such a
>> task.
>>
>> The unfortunate fact that mean.data.frame() exists makes people
>> think that median.data.frame() should too,
>> and then
>>
>>    var.data.frame()
>>     sd.data.frame()
>>    mad.data.frame()
>>    min.data.frame()
>>    max.data.frame()
>>    ...
>>    ...
>>
>> all just in order to *not* to have to know  sapply()
>> ????
>>
>> No, rather not.
>>
>> My vote is for deprecating  mean.data.frame().
>>
>> Martin
>>
>
>--
>Patrick Burns
>pburns at pburns.seanet.com
>twitter: @portfolioprobe
>http://www.portfolioprobe.com/blog
>http://www.burns-stat.com
>(home of 'Some hints for the R beginner'
>and 'The R Inferno')



More information about the R-devel mailing list