[R] Understanding R Hist() Results...

Thu Jun 4 13:52:49 CEST 2009

Thank you again for all the R help folks who responded.  I again appreciate all the help and insight and will investigate the options suggested. 

I guess I still doing a little head scratching at how the division occurred:

It looks like the default hist(...) behavior is doing the following:
HouseHist<-hist(as.numeric(HouseYear_array)) 
HouseHist$counts
[1] 2 1 4 4 8 8

That would equate to the following grouping of the years:
[90, 91] (91, 92] (92, 93] (93, 94] (94, 95] (95, 96] 

However, the true division is something like the following:
table(as.numeric(HouseYear_array))
1990 1991 1992 1993 1994 1995 1996 
   1    1    1    4    4    8    8 

Seems like hist behavior could have been:
(89, 90] (90, 91] (91, 92] (92, 93] (93, 94] (94, 95] (95, 96]

Of course, I haven't had any coffee yet...

This goes with the following example:
http://n2.nabble.com/What-is-going-on-with-Histogram-Plots-td3022645.htm

--- On Thu, 6/4/09, Ted.Harding at manchester.ac.uk <Ted.Harding at manchester.ac.uk> wrote:

> From: Ted.Harding at manchester.ac.uk <Ted.Harding at manchester.ac.uk>
> Subject: RE: [R] Understanding R Hist() Results...
> To: R-help at r-project.org
> Cc: "Jason Rupert" <jasonkrupert at yahoo.com>
> Date: Thursday, June 4, 2009, 5:13 AM
> On 04-Jun-09 04:00:11, Jason Rupert
> wrote:
> > 
> > Think I'm missing something to understand what is
> going on with
> > hist(...)
> > 
> > http://n2.nabble.com/What-is-going-on-with-Histogram-Plots-td3022645.htm
> > l
> > 
> > For my example I count 7 unique years, however, on the
> histogram there
> > only 6.  It looks like the bin to the left of the
> tic mark on the
> > x-axis represents the number of entries for that year,
> i.e. Frequency. 
> > 
> > I guess it looks like the bin for 1990 is
> missing.  Is there a better
> > way or a different histogram R command to use in order
> to see all the
> > age bins and them for them to be aligned directly over
> the year tic
> > mark on the x-axis?  
> > 
> > Thanks again for any insights that can be provided.
> 
> It's doing what it's supposed to -- which admitredly could
> be confusing
> when all your data lie on the exact boundaries between
> bins.
> 
> From ?hist, by default "include.lowest = TRUE, right =
> TRUE", and:
> 
>   If 'right = TRUE' (default), the histogram cells are
> intervals of
>   the form '(a, b]', i.e., they include their
> right-hand endpoint,
>   but not their left one, with the exception of the
> first cell when
>   'include.lowest' is 'TRUE'.
> 
> In your data:
> 
>  sort(HouseYear_array)
>  [1] "1990" "1991" "1992" "1993" "1993" "1993" "1993"
> "1994" "1994"
> [10] "1994" "1994" "1995" "1995" "1995" "1995" "1995"
> "1995" "1995"
> [20] "1995" "1996" "1996" "1996" "1996" "1996" "1996"
> "1996" "1996"
> 
> and, with
> 
>   H<-hist(as.numeric(HouseYear_array))
>   H$breaks
>   # [1] 1990 1991 1992 1993 1994 1995 1996
> 
> so you get 2 (1990,1991) in the [1990-1] bin, 1 in the
> [1991-2] bin,
> 4 in [1992-3], and so on, exactly as observed.
> 
> You can get what you're expecting to see by setting the
> 'breaks'
> parameter explicitly, and making sure the breakpoints do
> not
> coincide with data (which ensures that there is no
> confusion about
> what goes in which bin):
> 
>  
> hist(as.numeric(HouseYear_array),breaks=0.5+(1989:1996))
> 
> Ted.
> 
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 04-Jun-09           
>                
>            Time:
> 11:13:22
> ------------------------------ XFMail
> ------------------------------
>