[R] strange behaviour of median

Petr PIKAL petr.pikal at precheza.cz
Thu Feb 4 10:58:36 CET 2010


Hi

so do you think I shall fire a bug announcement? I think I rather wait to 
see if there is some reaction from others. Maybe, there is some reason 
behind such behaviour. Those simple statistics tend to behave differently 
when operating on data.frames so median is not such a huge surprise.

see

sd(df1), var(df1), mean(df1), max(df1), min(df1), range(df1)

Produced results are usually clearly documented, however for novice it is 
rather mysterious why using those functions on vector produce easily 
understandable results but using them on data.frame (which is most common 
structure of data) is far from consistent and intuitive.

But I agree with you that mean and median in best case shall give similar 
results regarding results structure.

Regards
Petr

r-help-bounces at r-project.org napsal dne 04.02.2010 10:28:16:

> Well, I get the same as Petr with  R version 2.10.0 (2009-10-26)
> on Linux.
> 
> To me, this suggests that median is broken! Any user would,
> a priori, expect that median() should operate in exactly
> the same way as mean(). To extend Petr's example:
> 
>   mat <- matrix(1:32, 4,8)
>   df1 <- data.frame(mat)
>   mean(df1)
>   #   X1   X2   X3   X4   X5   X6   X7   X8 
>   #  2.5  6.5 10.5 14.5 18.5 22.5 26.5 30.5 
>   median(df1)
>   # [1] 14.5 18.5
> 
> so (as in Petr's original example, but more clearly) median()
> returns the medians of the two "central" columns X4 and X5 of df1.
> 
> But that is with an even number of columns. Now look at what
> happens with an odd number:
> 
>   mat <- matrix(1:28, 4,7)
>   df1 <- data.frame(mat)
>   mean(df1)
>   #   X1   X2   X3   X4   X5   X6   X7 
>   #  2.5  6.5 10.5 14.5 18.5 22.5 26.5 
>   median(df1)
>   #   structure(c("13", "14", "15", "16"), class = "AsIs")
>   # 1                                                   13
>   # 2                                                   14
>   # 3                                                   15
>   # 4                                                   16
> 
> Wow!!!!!!!!!!
> 
> This does suggest a tie-in with Petr's observation about "As.Is",
> and there is no doubt at all that the above result is rubbish.
> It is certainly not what a user would expect, and in the context
> of Petr's intention to present R lessons to a class, I could
> foresee students turning their backs on R if they came up with
> such a result in their early encounters!
> 
> Ted.
> 
> On 04-Feb-10 08:59:59, Mario Valle wrote:
> > Linux 2.9.0 gives:
> > 
> >> median(df1)
> > [1] 34
> > 
> > Ever stranger...
> >               mario
> > 
> > Petr PIKAL wrote:
> >> During some experimentation in preparing R lessons I encountered this 

> >> behaviour which I can not explain fully
> >> 
> >> mat <- matrix(1:16, 4,4)
> >> df1 <- data.frame(mat)
> >> 
> >>> mean(df1)
> >>   X1   X2   X3   X4 
> >>  2.5  6.5 10.5 14.5 
> >> 
> >> Expected, documented
> >> 
> >>> median(df1)
> >> [1]  6.5 10.5
> >> 
> >> Rather weird, AFAIK there shall not be an issue with data frame at
> >> least I 
> >> did not find any in help page. I tracked it down probably to an As.Is 

> >> operation with object and subsequent sorting in median.default.
> >> 
> >> I know other (*apply) ways how to compute median for data frames so I
> >> just 
> >> would like to hear an opinion about this behaviour from more
> >> experienced 
> >> people.
> >> 
> >> Thank you
> >> Best regards
> >> 
> >> Petr
> >> 
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> > 
> > -- 
> > Ing. Mario Valle
> > Data Analysis and Visualization Group            |
> > http://www.cscs.ch/~mvalle
> > Swiss National Supercomputing Centre (CSCS)      | Tel:  +41 (91)
> > 610.82.60
> > v. Cantonale Galleria 2, 6928 Manno, Switzerland | Fax:  +41 (91)
> > 610.82.82
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 04-Feb-10                                       Time: 09:28:13
> ------------------------------ XFMail ------------------------------
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list