[R] Why does "summary" show number of NAs as non-integer?
Gabor Grothendieck
ggrothendieck at gmail.com
Wed Jun 1 15:48:13 CEST 2005
On 6/1/05, Earl F. Glynn <efg at stowers-institute.org> wrote:
> "Berton Gunter" <gunter.berton at gene.com> wrote in message
> news:200505312240.j4VMepGX000203 at hertz.gene.com...
> > summary() is an S3 generic that for your vector dispatches
> > summary.default(). The output of summary default has class "table" and so
> > calls print.table (print is another S3 generic). Look at the code of
> > print.table() to see how it formats the output.
>
> "Marc Schwartz" <MSchwartz at MedAnalytics.com> wrote in message
> news:1117582325.22595.175.camel at horizons.localdomain...
> > On Tue, 2005-05-31 at 17:14 -0500, Earl F. Glynn wrote:
>
> > > Why isn't the number of NA's just "2" instead of the "2.000" shown
> above?
>
> > "The same number of decimal places is used throughout a vector
>
> I'm talking about how this should be designed. The current impementation
> may be to print a vector using generic logic, but why use generic logic to
> produce a wrong solution? Shouldn't correctness be more important than using
> a generic solution?
>
> There is special logic to suppress NA's when they don't exist (see below),
> so why isn't there special logic to print the count of NAs, which MUST be an
> integer, correctly when they do exist?
>
> An integer should NOT be displayed with meaningless decimal places. Why
> would this ever be desirable? The generic solution should be dropped in
> favor of a correct solution.
>
> # Why not use special logic to show the number of NA's correctly as an
> integer?
> > set.seed(19)
> > summary( c(NA, runif(10,1,100), NaN) )
> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
> 7.771 24.850 43.040 43.940 63.540 83.830 2.000
>
> # There is already special logic to suppress NA's
> > set.seed(19)
> > summary( runif(10,1,100) )
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> 7.771 24.850 43.040 43.940 63.540 83.830
>
> "2.000" and "2" do not have equivalent meaning.
Try:
R> library(Hmisc)
R> describe( c(NA, runif(10,1,100), NaN) )
c(NA, runif(10, 1, 100), NaN)
n missing unique Mean .05 .10 .25 .50 .75 .90
10 2 10 50.99 15.24 16.82 21.14 52.70 76.35 83.52
.95
90.79
13.65 17.17 18.12 30.18 46.21 59.19 65.36 80.01 81.90 98.06
Frequency 1 1 1 1 1 1 1 1 1 1
% 10 10 10 10 10 10 10 10 10 10
More information about the R-help
mailing list