[R] Histogram omitting/collapsing groups
Sarah Goslee
sarah.goslee at gmail.com
Sat Dec 31 17:20:48 CET 2011
Hi,
I think you're not understanding quite what's going on with hist. Reread the
help, and take a look at this small example. The solution I'd use is the last
item.
> x <- rep(1:10, times=1:10)
> table(x)
x
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
>
>
> hist(x, plot=FALSE, right=TRUE)$counts
[1] 3 3 4 5 6 7 8 9 10
> hist(x, plot=FALSE, right=TRUE)$breaks
[1] 1 2 3 4 5 6 7 8 9 10
> hist(x, plot=FALSE, right=TRUE)$mids
[1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
>
>
> hist(x, plot=FALSE, right=FALSE)$counts
[1] 1 2 3 4 5 6 7 8 19
> hist(x, plot=FALSE, right=FALSE)$breaks
[1] 1 2 3 4 5 6 7 8 9 10
> hist(x, plot=FALSE, right=FALSE)$mids
[1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
>
>
> hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$counts
[1] 1 2 3 4 5 6 7 8 9 10
> hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$breaks
[1] 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5
> hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$mids
[1] 1 2 3 4 5 6 7 8 9 10
Sarah
On Sat, Dec 31, 2011 at 10:25 AM, Aren Cambre <aren at arencambre.com> wrote:
> I have two large datasets (156K and 2.06M records). Each row has the
> hour that an event happened, represented by an integer from 0 to 23.
>
> R's histogram is combining some data.
>
> Here's the command I ran to get the histogram:
>> histinfo <- hist(crashes$hour, right=FALSE)
>
> Here's histinfo:
>> histinfo
> $breaks
> [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
>
> $counts
> [1] 4755 4618 5959 3292 2378 2715 4592 6144 6860 5598 5601
> 6596 7152 7490 8166
> [16] 9758 11301 11745 9943 7494 6272 6220 11669
>
> $intensities
> [1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
> 0.02937602 0.03930449
> [9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
> 0.05223967 0.06242403
> [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
> 0.07464911
>
> $density
> [1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
> 0.02937602 0.03930449
> [9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
> 0.05223967 0.06242403
> [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
> 0.07464911
>
> $mids
> [1] 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5
> 13.5 14.5 15.5 16.5 17.5
> [19] 18.5 19.5 20.5 21.5 22.5
>
> $xname
> [1] "crashes$hour"
>
> $equidist
> [1] TRUE
>
> attr(,"class")
> [1] "histogram"
>
> Note how the last value in counts is 11669. It's relevant to the
> output of table(crashes$hour):
> 0 1 2 3 4 5 6 7 8 9 10
> 11 12 13 14
> 4755 4618 5959 3292 2378 2715 4592 6144 6860 5598 5601
> 6596 7152 7490 8166
> 15 16 17 18 19 20 21 22 23
> 9758 11301 11745 9943 7494 6272 6220 6000 5669
>
> Notice how the sum of 22 and 23 from table(crashes$hour) is 11669? Is
> that correct for the histogram to combine hours 22 and 23? Since I
> specified right = FALSE, I figured there's no way 23 would be combined
> with 22?
>
> Adding breaks=24 to the hist makes no difference; it's still stuck at
> 23 breaks. I also tried breaks=25 and 23 and several other values, in
> case I am misinterpreting breaks's meaning, but none of them make a
> difference.
>
> I imagine this is a n00b question, so my apologies if this is obvious.
>
> Aren
>
--
Sarah Goslee
http://www.functionaldiversity.org
More information about the R-help
mailing list