[R] Why does "summary" show number of NAs as non-integer?

Earl F. Glynn efg at stowers-institute.org
Wed Jun 1 15:41:05 CEST 2005


"Berton Gunter" <gunter.berton at gene.com> wrote in message
news:200505312240.j4VMepGX000203 at hertz.gene.com...
> summary() is an S3 generic that for your vector dispatches
> summary.default(). The output of summary default has class "table" and so
> calls print.table (print is another S3 generic). Look at the code of
> print.table() to see how it formats the output.

"Marc Schwartz" <MSchwartz at MedAnalytics.com> wrote in message
news:1117582325.22595.175.camel at horizons.localdomain...
> On Tue, 2005-05-31 at 17:14 -0500, Earl F. Glynn wrote:

> > Why isn't the number of NA's just "2" instead of the "2.000" shown
above?

> "The same number of decimal places is used throughout a vector

I'm talking about how this should be designed.  The current impementation
may be to print a vector using generic logic, but why use generic logic to
produce a wrong solution? Shouldn't correctness be more important than using
a generic solution?

There is special logic to suppress NA's when they don't exist (see below),
so why isn't there special logic to print the count of NAs, which MUST be an
integer, correctly when they do exist?

An integer should NOT be displayed with meaningless decimal places. Why
would this ever be desirable?  The generic solution should be dropped in
favor of a correct solution.

# Why not use special logic to show the number of NA's correctly as an
integer?
> set.seed(19)
> summary( c(NA, runif(10,1,100), NaN) )
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
  7.771  24.850  43.040  43.940  63.540  83.830   2.000

# There is already special logic to suppress NA's
> set.seed(19)
> summary( runif(10,1,100) )
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  7.771  24.850  43.040  43.940  63.540  83.830

"2.000" and "2" do not have equivalent meaning.

efg




More information about the R-help mailing list