[Rd] Binning of integers with hist() function odd results (PR#14046)

Peter Dalgaard p.dalgaard at biostat.ku.dk
Sat Nov 7 17:27:35 CET 2009


gug at fnal.gov wrote:
> Full_Name: Gerald Guglielmo
> Version: 2.8.1 (2008-12-22)
> OS: OSX Leopard
> Submission from: (NULL) (131.225.103.35)
> 
> 
> When I attempt to use the hist() function to bin integers the behavior seems
> very odd as the bin boundary seems inconsistent across the various bins. For
> some bins the upper boundary includes the next integer value, while in others it
> does not. If I add 0.1 to every value, then the hist() binning behavior is what
> I would normally expect. 
> 
>> h1<-hist(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5))
>> h1$mids
> [1] 1.5 2.5 3.5 4.5
>> h1$counts
> [1] 3 3 4 5
>> h2<-hist(c(1.1,2.1,2.1,3.1,3.1,3.1,4.1,4.1,4.1,4.1,5.1,5.1,5.1,5.1,5.1))
>> h2$mids
> [1] 1.5 2.5 3.5 4.5 5.5
>> h2$counts
> [1] 1 2 3 4 5
> 
> Naively I would have expected the same distribution of counts in the two cases,
> but clearly that is not happening. This is a simple example to illustrate the
> behavior, originally I noticed this while binning a large data sample where I
> had set the breaks=c(0,24,1).

This is as documented. See the include.lowest argument. Annoying, but 
not a bug.

(It is arguably a design error that hist() is looking for "pretty" 
breakpoints rather than pretty midpoints, or maybe something more 
advanced to handle cases where the data are effectively tied to a 
lattice. It's been around "forever", though.)

-- 
    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907



More information about the R-devel mailing list