[R] odd behavior of "summary" function
David Winsemius
dwinsemius at comcast.net
Tue Aug 24 19:21:27 CEST 2010
On Aug 24, 2010, at 1:06 PM, Mike Williamson wrote:
> Hello All,
>
> Using the standard "summary" function in 'R', I ran across some odd
> behavior that I cannot understand. Easy to reproduce:
>
> Typing:
>
> summary(c(6,207936))
>
> Yields::
>
> Min. *1st Qu. Median Mean 3rd Qu. Max.*
> 6 *51990 104000 104000 156000 207900*
>
>
> None of these values are correct except for the minimum. If I
> perform
> "quantile(c(6, 207936))", it gives the correct values. I originally
> presumed that summary was merely calling "quantile" if it saw a
> numeric, but
> this doesn't seem to be the case.
I would have assumed as you did, and continue to think so with
appropriate modification of "merely" after reading the code in
summary.default:
else if (is.numeric(object)) {
nas <- is.na(object)
object <- object[!nas]
qq <- stats::quantile(object)
qq <- signif(c(qq[1L:3L], mean(object), qq[4L:5L]), digits)
names(qq) <- c("Min.", "1st Qu.", "Median", "Mean", "3rd Qu.",
"Max.")
if (any(nas))
c(qq, `NA's` = sum(nas))
else qq
Notice the digits argument:
> summary(c(6,207936))
Min. 1st Qu. Median Mean 3rd Qu. Max.
6 51990 104000 104000 156000 207900
> quantile(c(6,207936))
0% 25% 50% 75% 100%
6.0 51988.5 103971.0 155953.5 207936.0
> summary(c(6,207936), digits=6)
Min. 1st Qu. Median Mean 3rd Qu. Max.
6.0 51988.5 103971.0 103971.0 155954.0 207936.0
>
> Anyone know what's going on here? On a related note, what is the
> statistically correct answer for calculating the 1st quartile & 3rd
> quartile
> when only 2 values are present? I presume one takes the mid-point
> between
> the median (also calculated) and the min or max. So in this case,
> 51988.5
> for 1st & 155953.5 for 3rd (which is what quantile calculates). But
> taking
> 25% & 75% of the sum of the 2 also seems "reasonable". Either way,
> "summary" is calculating the wrong number, and most disturbing is
> that it
> mis-calculates the max.
>
> Regards,
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list