[R] histogram scott

Duncan Murdoch murdoch at stats.uwo.ca
Fri Feb 5 19:29:15 CET 2010


On 05/02/2010 12:21 PM, maram salem wrote:
> Dear all, 
> I want to use the histogtam as a density estimator, with the binwidths calculated using scott's formula which is
> binwidth = 3.49*ST.dev.*n^(-1/3)
> for the following data  (30 data points)
> 12-9-3-6-1-23-21-7-18-16-15-4-19-22-20-2-3-18-8-10-1-7-5-4-11-12-3-9-19-7
> so first,I' ve tried this manually, and substituted in the above formula and I got
> st.dev.=7.02745
> and thus the binwidth=7.89313
>
> But when I used hist with breaks = "scott", that is
> h<-hist(x,breaks="scott")
> I got the breaks in the histogram object = 0  10   20   30
> that is, the binwidth used is equal to 10 not 7.89313??
> I don't know why? 
> shouldn't they be exactly the same??
No, R prefers to put breaks on round numbers.  It uses the Scott or 
other rule to work out approximately how many there should be, then 
picks nice round numbers that come close.  If you want the bins on 
particular exact locations, then you need to give the breaks 
explicitly.  As the documentation for the "breaks" argument says,
"In the last three cases the number is a suggestion only."

Duncan Murdoch



More information about the R-help mailing list