[R] Confused in simplest-possible function

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Mar 1 09:46:53 CET 2004


This really isn't to do with functions, but to do with the fine details 
of formatting inside methods for summary().  It happens if you can summary 
directly:

> a
[1]  854.2 1533.3 1011.1
> print(a, digits=4)
[1]  854.2 1533.3 1011.1
> summary(a)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  854.2   932.7  1011.0  1133.0  1272.0  1533.0

In the last two cases, the result is supposed to be printed to 4
significant figures, but for print() it is applied to the whole vector and
for summary() to each of the numbers individually.

> DF <- data.frame(a=a)
> print(DF, digits=4)
       a
1  854.2
2 1533.3
3 1011.1
> summary(DF)
       a
 Min.   : 854.2
 1st Qu.: 932.6
 Median :1011.1
 Mean   :1132.9
 3rd Qu.:1272.2
 Max.   :1533.3

again applies the digits to the whole set of results.

The underlying difference is that summary.default calls signif() to set
the number of significant digits and summary.data.frame calls format() and
they (deliberately) behave differently when applied to a vector.

It looks like it was decided about three years ago that this was less 
confusing than what went before -- if we were to change anything it would 
be how summary.default behaved I think.


On Sun, 29 Feb 2004, Ajay Shah wrote:

> I wrote the following code:
> 
>   ---------------------------------------------------------------------------
>   oneindex <- function(x) {
>         summary(x)
>   }
> 
>   A <- read.table("try.data",
>                   col.names=c("date", "lNifty"))
>   summary(A)
>   oneindex(A$lNifty)
>   ---------------------------------------------------------------------------
> 
> where I read in data, make a summary directly, and then call a
> function `oneindex' which merely makes a summary.
> 
> I'm puzzled because the two summaries disagree :
> 
> > oneindex <- function(x) {
> +       summary(x)
> + }
> > 
> > A <- read.table("try.data",
> +                 col.names=c("date", "lNifty"))
> > summary(A)
>          date         lNifty      
>  2000-06-12:  1   Min.   : 854.2  
>  2000-06-13:  1   1st Qu.:1032.8  
>  2000-06-15:  1   Median :1088.7  
>  2000-06-16:  1   Mean   :1123.6  
>  2000-06-19:  1   3rd Qu.:1178.7  
>  2000-06-20:  1   Max.   :1533.3  
>  (Other)   :780                   
> > oneindex(A$lNifty)
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
>   854.2  1033.0  1089.0  1124.0  1179.0  1533.0 
> 
> Here you see the median showing up as 1088.7 in the 1st case but
> 1089.0 in the 2nd case. How could that happen?
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list