> As a practical matter, 'continuous' data must be discretized, so if you
> have long vectors of it you will run into this problem.

Yep, and it is a bit unfortunate that hist() tries to use "pretty" breakpoints, so that you will have data points on the boundaries, causing all the left/right/endpoint business to come into play. The truehist() function in MASS does somewhat better.

For the case at hand, things are much improved by setting the breaks explicitly:

hist(y,freq=TRUE, col='red', breaks=0.5:6.5)

but as pointed out by others, it is a much better idea to do

plot(factor(y, levels=1:6))

or similar.

Incidentally, what is the most handy way to get a plot with percentages instead of counts? This works, but seems a bit ham-fisted:

barplot(prop.table(table(factor(y, levels=1:6))))

-pd

>>> Hi, everyone.
>>> I stumbled upon weird histogram behaviour.
>>
>>> Consider this "dice emulator":
>>> Step 1: Generate uniform random array x of size N.
>>> Step 2: Multiply each item by six and round to next bigger integer
>> to get numbers 1 to 6.
>>> Step 3: Plot histogram.
>>>> x<-runif(N)
>>>> y<-ceiling(x*6)
>>>> hist(y,freq=TRUE, col='orange')
>>
>>> Now what I get with N=100000
>>
>>>> x<-runif(100000)
>>>> y<-ceiling(x*6)
>>>> hist(y,freq=TRUE, col='green')
>>
>>> At first glance looks OK.
>>
>>> Now try N=100
>>
>>>> x<-runif(100)
>>>> y<-ceiling(x*6)
>>>> hist(y,freq=TRUE, col='red')
>>> Now first bar is not where it should be.
>>> Hmm. Look again to 100000 histogram... First bar is not where I want
>> it, it's only less striking due to narrow bars.
>>
>>> So, first bar is always in wrong position. How do I fix it to make
>> perfectly spaced bars?
>> Don't use histograms *at all* for such discrete integer data.
>>
>> N <- rpois(100, 5)
>> plot(table(N), lwd = 4)
>>
>> Histograms should be only be used for continuous data (or discrete data
>> with "many" possible values).
>>
>> It's a pain to see them so often "misused" for data like the 'N' above.
>>
>> Martin Maechler,
>> ETH Zurich
>>
