[R] histogram first bar wrong position

(Ted Harding) Ted.Harding at wlandres.net
Thu Dec 22 18:23:30 CET 2016


Willam has listed the lid on the essence of the problem, which is
that in R the way that breaks (and therefore counts) in a histogram
are evaluated is an area of long grass with lurking snakes!

To get a glimpse of this, have a look at
  ?hist
and in the seaction "Arguments", look at "breaks", "freq", "right".
Also see under "Details".

and, as suggested under "See also", look at
  ?nclass.Sturges

As William suggests, if you know what claa intervals you want,
create them yourself! For your example (with N=100), look at:

   hist(y,freq=TRUE, col='red', breaks=0.5+(0:6))

or

   hist(y,freq=TRUE, col='red', breaks=0.25+(0:12)/2)

Hoping this helps!
Best wishes,
Ted.


On 22-Dec-2016 16:36:34 William Dunlap via R-help wrote:
> Looking at the return value of hist will show you what is happening:
> 
>> x <- rep(1:6,10*(6:1))
>> z <- hist(x, freq=TRUE)
>> z
> $breaks
>  [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
> 
> $counts
>  [1] 60 50  0 40  0 30  0 20  0 10
> ...
> 
> The the first bin is [1-1.5], including both endpoints, while the other
> bins include only the upper endpoint.  I recommend defining your
> own breakpoints, ones don't include possible data points, as in
> 
>> print(hist(x, breaks=seq(min(x)-0.5, max(x)+0.5, by=1), freq=TRUE))
> $breaks
> [1] 0.5 1.5 2.5 3.5 4.5 5.5 6.5
> 
> $counts
> [1] 60 50 40 30 20 10
> ...
> 
> S+ had a 'factor' method for hist() that did this sort of thing, but R does
> not.
> 
> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> 
> On Thu, Dec 22, 2016 at 5:17 AM, itpro <itpro1 at yandex.ru> wrote:
> 
>> Hi, everyone.
>>
>>
>> I stumbled upon weird histogram behaviour.
>>
>> Consider this "dice emulator":
>> Step 1: Generate uniform random array x of size N.
>> Step 2: Multiply each item by six and round to next bigger integer to get
>> numbers 1 to 6.
>> Step 3: Plot histogram.
>>
>> > x<-runif(N)
>> > y<-ceiling(x*6)
>> > hist(y,freq=TRUE, col='orange')
>>
>>
>> Now what I get with N=100000
>>
>> > x<-runif(100000)
>> > y<-ceiling(x*6)
>> > hist(y,freq=TRUE, col='green')
>>
>> At first glance looks OK.
>>
>> Now try N=100
>>
>> > x<-runif(100)
>> > y<-ceiling(x*6)
>> > hist(y,freq=TRUE, col='red')
>>
>> Now first bar is not where it should be.
>> Hmm. Look again to 100000 histogram... First bar is not where I want it,
>> it's only less striking due to narrow bars.
>>
>> So, first bar is always in wrong position. How do I fix it to make
>> perfectly spaced bars?
>>
>>
>>
>>
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 22-Dec-2016  Time: 17:23:26
This message was sent by XFMail



More information about the R-help mailing list