[R] odd behavior of "summary" function

David Winsemius dwinsemius at comcast.net
Tue Aug 24 19:21:27 CEST 2010


On Aug 24, 2010, at 1:06 PM, Mike Williamson wrote:

> Hello All,
>
>    Using the standard "summary" function in 'R', I ran across some odd
> behavior that I cannot understand.  Easy to reproduce:
>
> Typing:
>
>   summary(c(6,207936))
>
> Yields::
>
>   Min. *1st Qu.  Median    Mean 3rd Qu.    Max.*
>      6   *51990  104000  104000  156000  207900*
>
>
>    None of these values are correct except for the minimum.  If I  
> perform
> "quantile(c(6, 207936))", it gives the correct values.  I originally
> presumed that summary was merely calling "quantile" if it saw a  
> numeric, but
> this doesn't seem to be the case.

I would have assumed as you did, and continue to think so with  
appropriate modification of "merely"  after reading the code in  
summary.default:

else if (is.numeric(object)) {
         nas <- is.na(object)
         object <- object[!nas]
         qq <- stats::quantile(object)
         qq <- signif(c(qq[1L:3L], mean(object), qq[4L:5L]), digits)
         names(qq) <- c("Min.", "1st Qu.", "Median", "Mean", "3rd Qu.",
             "Max.")
         if (any(nas))
             c(qq, `NA's` = sum(nas))
         else qq


Notice the digits argument:

 > summary(c(6,207936))
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
       6   51990  104000  104000  156000  207900
 > quantile(c(6,207936))
       0%      25%      50%      75%     100%
      6.0  51988.5 103971.0 155953.5 207936.0

 > summary(c(6,207936), digits=6)
     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
      6.0  51988.5 103971.0 103971.0 155954.0 207936.0

>

>  Anyone know what's going on here?  On a related note, what is the
> statistically correct answer for calculating the 1st quartile & 3rd  
> quartile
> when only 2 values are present?  I presume one takes the mid-point  
> between
> the median (also calculated) and the min or max.  So in this case,  
> 51988.5
> for 1st & 155953.5 for 3rd (which is what quantile calculates).  But  
> taking
> 25% & 75% of the sum of the 2 also seems "reasonable".  Either way,
> "summary" is calculating the wrong number, and most disturbing is  
> that it
> mis-calculates the max.
>
>                                            Regards,


David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list