[R] summary() vs mean()

Marc R. Feldesman feldesmanm at pdx.edu
Thu Feb 1 18:47:02 CET 2001


Below is output after fixing summary.data.frame as you suggest.  This 
output now matches that in SPlus 2000 and SPlus 6.0 (Win Beta 2).

However, in light of the issue of significant digits, there still seems to 
be an inconsistency here (both in R and S dialects).  All the values for 
body weight print (?have) one decimal digit, while all the values for brain 
weight print (?have) 4.  Since all the original values in the data file are 
recorded without decimal digits at all, I find it strange that the (for 
example) minimum for body weight is 3.0, while the minimum for brain weight 
is 26.0000.  They're 3 and 26, respectively, in the original data 
file.  Why should one be reported to one decimal digit and the other to 
4?  This pattern follows throughout.

I don't think this is an R problem since a similar pattern (but with 3 
decimal digits) occurs in S-Plus.


S-Plus 6.0 (Win, Beta) OUTPUT
 > summary(mammals, digits=8)
    Body.Weight      Brain.Weight
     Min.:   3.0      Min.:  26.000
  1st Qu.:  35.5   1st Qu.: 138.500
   Median: 100.0    Median: 406.000
     Mean: 761.2      Mean:1000.467
  3rd Qu.: 493.0   3rd Qu.: 667.500
     Max.:6654.0      Max.:5712.000
 >

Obviously this isn't a giant problem, but one that a student first brought 
to my attention and I've been scratching my head trying to puzzle it out 
ever since.

R-1.2.1 (Windows, Binary after Brian Ripley's code fix)

 > summary(mammals, digits=8)
        Name    Body.Weight      Brain.Weight
  Red Fox :1   Min.   :   3.0   Min.   :  26.0000
  Pig     :1   1st Qu.:  35.5   1st Qu.: 138.5000
  Man     :1   Median : 100.0   Median : 406.0000
  Kangaroo:1   Mean   : 761.2   Mean   :1000.4667
  Jaguar  :1   3rd Qu.: 493.0   3rd Qu.: 667.5000
  Horse   :1   Max.   :6654.0   Max.   :5712.0000
  (Other) :9



At 08:04 AM 2/1/01 +0000, Prof Brian D Ripley wrote:
 >On Wed, 31 Jan 2001, Marc R. Feldesman wrote:
 >
 >> Forgive what may seem to be a trivial question/problem.
 >>
 >> Below is some simple R 1.2.1(Windows) code with output.
 >>
 >>  > summary(mammals, digits=10)
 >>         Name    Body.Weight      Brain.Weight
 >>   Red Fox :1   Min.   :   3.0   Min.   :  26.0
 >>   Pig     :1   1st Qu.:  35.5   1st Qu.: 138.5
 >>   Man     :1   Median : 100.0   Median : 406.0
 >>   Kangaroo:1   Mean   : 761.2   Mean   :1000.0
 >>   Jaguar  :1   3rd Qu.: 493.0   3rd Qu.: 667.5
 >>   Horse   :1   Max.   :6654.0   Max.   :5712.0
 >>   (Other) :9
 >>  > mean(mammals[,3])
 >> [1] 1000.467  # <---summary() reports it as 1000.0
 >>  > mean(mammals[,2])
 >> [1] 761.2 # <- summary() reports it as 761.2
 >>
 >> I'm puzzled why the Brain.Weight mean from summary() is different from
 >> mean(mammals[,3]), while the Body.Weight means are identical in the two
 >> functions.  This isn't limited to R; I've observed the same thing in S-Plus
 >> 2000 (and v.6 beta).
 >
 >The results are to a certain number of significant figures, not decimal
 >places.
 >
 >> I can get the "right" answer in S-Plus using the digits argument (setting
 >> digits=8), but this argument doesn't seem to have any effect in R 1.2.1.  I
 >> *did* use it the way it is illustrated in the help file as well (e.g.
 >>
 >> summary(mammals, digits=max(10, getOption("digits")))
 >> )
 >> with the same results as above.
 >>
 >> So, I guess I have two questions:
 >>
 >> 1)  Why does S (in both S-Plus and R 1.2.1) produce different values for
 >> the means in the second variable but not the first?
 >
 >summary.default uses signif on the results, to by default 4 digits.
 >
 >> 2)  Why does the digits argument seem not to have any effect in R 1.2.1's
 >> summary()?
 >
 >Because R forgot to pass it down to summary.default.
 >
 >> P.S.  I also pasted the example code from the summary help file into the R
 >> 1.2.1 window.  The digits argument doesn't change the results there either.
 >
 >In R and summary.data.frame, digits is only used in formatting the result.
 >
 >Replace
 >
 >    z <- lapply(as.list(object), summary, maxsum = maxsum)
 >
 >by
 >
 >    z <- lapply(as.list(object), summary, maxsum = maxsum, digits = digits)
 >
 >in R.
 >
 >Brian
 >
 >
 >--
 >Brian D. Ripley,                  ripley at stats.ox.ac.uk
 >Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 >University of Oxford,             Tel:  +44 1865 272861 (self)
 >1 South Parks Road,                     +44 1865 272860 (secr)
 >Oxford OX1 3TG, UK                Fax:  +44 1865 272595


Dr. Marc R. Feldesman
email:  feldesmanm at pdx.edu
email:  feldesman at attglobal.net
fax:    503-725-3905

"Don't know where I'm going.
Don't like where I've been.
There may be no exit.
But hell, I'm going in."  Jimmy Buffett

Powered by Superchoerus - the 700 MHz Coppermine Box

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list