[R] Why does "summary" show number of NAs as non-integer?

Gabor Grothendieck ggrothendieck at gmail.com
Wed Jun 1 15:48:13 CEST 2005


On 6/1/05, Earl F. Glynn <efg at stowers-institute.org> wrote:
> "Berton Gunter" <gunter.berton at gene.com> wrote in message
> news:200505312240.j4VMepGX000203 at hertz.gene.com...
> > summary() is an S3 generic that for your vector dispatches
> > summary.default(). The output of summary default has class "table" and so
> > calls print.table (print is another S3 generic). Look at the code of
> > print.table() to see how it formats the output.
> 
> "Marc Schwartz" <MSchwartz at MedAnalytics.com> wrote in message
> news:1117582325.22595.175.camel at horizons.localdomain...
> > On Tue, 2005-05-31 at 17:14 -0500, Earl F. Glynn wrote:
> 
> > > Why isn't the number of NA's just "2" instead of the "2.000" shown
> above?
> 
> > "The same number of decimal places is used throughout a vector
> 
> I'm talking about how this should be designed.  The current impementation
> may be to print a vector using generic logic, but why use generic logic to
> produce a wrong solution? Shouldn't correctness be more important than using
> a generic solution?
> 
> There is special logic to suppress NA's when they don't exist (see below),
> so why isn't there special logic to print the count of NAs, which MUST be an
> integer, correctly when they do exist?
> 
> An integer should NOT be displayed with meaningless decimal places. Why
> would this ever be desirable?  The generic solution should be dropped in
> favor of a correct solution.
> 
> # Why not use special logic to show the number of NA's correctly as an
> integer?
> > set.seed(19)
> > summary( c(NA, runif(10,1,100), NaN) )
>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
>  7.771  24.850  43.040  43.940  63.540  83.830   2.000
> 
> # There is already special logic to suppress NA's
> > set.seed(19)
> > summary( runif(10,1,100) )
>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>  7.771  24.850  43.040  43.940  63.540  83.830
> 
> "2.000" and "2" do not have equivalent meaning.

Try:

R> library(Hmisc)
R> describe( c(NA, runif(10,1,100), NaN) )
c(NA, runif(10, 1, 100), NaN) 
      n missing  unique    Mean     .05     .10     .25     .50     .75     .90 
     10       2      10   50.99   15.24   16.82   21.14   52.70   76.35   83.52 
    .95 
  90.79 

          13.65 17.17 18.12 30.18 46.21 59.19 65.36 80.01 81.90 98.06
Frequency     1     1     1     1     1     1     1     1     1     1
%            10    10    10    10    10    10    10    10    10    10




More information about the R-help mailing list