[R] Understanding R Hist() Results...
Jason Rupert
jasonkrupert at yahoo.com
Thu Jun 4 13:52:49 CEST 2009
Thank you again for all the R help folks who responded. I again appreciate all the help and insight and will investigate the options suggested.
I guess I still doing a little head scratching at how the division occurred:
It looks like the default hist(...) behavior is doing the following:
HouseHist<-hist(as.numeric(HouseYear_array))
HouseHist$counts
[1] 2 1 4 4 8 8
That would equate to the following grouping of the years:
[90, 91] (91, 92] (92, 93] (93, 94] (94, 95] (95, 96]
However, the true division is something like the following:
table(as.numeric(HouseYear_array))
1990 1991 1992 1993 1994 1995 1996
1 1 1 4 4 8 8
Seems like hist behavior could have been:
(89, 90] (90, 91] (91, 92] (92, 93] (93, 94] (94, 95] (95, 96]
Of course, I haven't had any coffee yet...
This goes with the following example:
http://n2.nabble.com/What-is-going-on-with-Histogram-Plots-td3022645.htm
--- On Thu, 6/4/09, Ted.Harding at manchester.ac.uk <Ted.Harding at manchester.ac.uk> wrote:
> From: Ted.Harding at manchester.ac.uk <Ted.Harding at manchester.ac.uk>
> Subject: RE: [R] Understanding R Hist() Results...
> To: R-help at r-project.org
> Cc: "Jason Rupert" <jasonkrupert at yahoo.com>
> Date: Thursday, June 4, 2009, 5:13 AM
> On 04-Jun-09 04:00:11, Jason Rupert
> wrote:
> >
> > Think I'm missing something to understand what is
> going on with
> > hist(...)
> >
> > http://n2.nabble.com/What-is-going-on-with-Histogram-Plots-td3022645.htm
> > l
> >
> > For my example I count 7 unique years, however, on the
> histogram there
> > only 6. It looks like the bin to the left of the
> tic mark on the
> > x-axis represents the number of entries for that year,
> i.e. Frequency.
> >
> > I guess it looks like the bin for 1990 is
> missing. Is there a better
> > way or a different histogram R command to use in order
> to see all the
> > age bins and them for them to be aligned directly over
> the year tic
> > mark on the x-axis?
> >
> > Thanks again for any insights that can be provided.
>
> It's doing what it's supposed to -- which admitredly could
> be confusing
> when all your data lie on the exact boundaries between
> bins.
>
> From ?hist, by default "include.lowest = TRUE, right =
> TRUE", and:
>
> If 'right = TRUE' (default), the histogram cells are
> intervals of
> the form '(a, b]', i.e., they include their
> right-hand endpoint,
> but not their left one, with the exception of the
> first cell when
> 'include.lowest' is 'TRUE'.
>
> In your data:
>
> sort(HouseYear_array)
> [1] "1990" "1991" "1992" "1993" "1993" "1993" "1993"
> "1994" "1994"
> [10] "1994" "1994" "1995" "1995" "1995" "1995" "1995"
> "1995" "1995"
> [20] "1995" "1996" "1996" "1996" "1996" "1996" "1996"
> "1996" "1996"
>
> and, with
>
> H<-hist(as.numeric(HouseYear_array))
> H$breaks
> # [1] 1990 1991 1992 1993 1994 1995 1996
>
> so you get 2 (1990,1991) in the [1990-1] bin, 1 in the
> [1991-2] bin,
> 4 in [1992-3], and so on, exactly as observed.
>
> You can get what you're expecting to see by setting the
> 'breaks'
> parameter explicitly, and making sure the breakpoints do
> not
> coincide with data (which ensures that there is no
> confusion about
> what goes in which bin):
>
>
> hist(as.numeric(HouseYear_array),breaks=0.5+(1989:1996))
>
> Ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 04-Jun-09
>
> Time:
> 11:13:22
> ------------------------------ XFMail
> ------------------------------
>
More information about the R-help
mailing list