[Rd] Binning of integers with hist() function odd results (PR#14046)
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Sat Nov 7 17:27:35 CET 2009
gug at fnal.gov wrote:
> Full_Name: Gerald Guglielmo
> Version: 2.8.1 (2008-12-22)
> OS: OSX Leopard
> Submission from: (NULL) (131.225.103.35)
>
>
> When I attempt to use the hist() function to bin integers the behavior seems
> very odd as the bin boundary seems inconsistent across the various bins. For
> some bins the upper boundary includes the next integer value, while in others it
> does not. If I add 0.1 to every value, then the hist() binning behavior is what
> I would normally expect.
>
>> h1<-hist(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5))
>> h1$mids
> [1] 1.5 2.5 3.5 4.5
>> h1$counts
> [1] 3 3 4 5
>> h2<-hist(c(1.1,2.1,2.1,3.1,3.1,3.1,4.1,4.1,4.1,4.1,5.1,5.1,5.1,5.1,5.1))
>> h2$mids
> [1] 1.5 2.5 3.5 4.5 5.5
>> h2$counts
> [1] 1 2 3 4 5
>
> Naively I would have expected the same distribution of counts in the two cases,
> but clearly that is not happening. This is a simple example to illustrate the
> behavior, originally I noticed this while binning a large data sample where I
> had set the breaks=c(0,24,1).
This is as documented. See the include.lowest argument. Annoying, but
not a bug.
(It is arguably a design error that hist() is looking for "pretty"
breakpoints rather than pretty midpoints, or maybe something more
advanced to handle cases where the data are effectively tied to a
lattice. It's been around "forever", though.)
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-devel
mailing list