[R] Histogram omitting/collapsing groups

Sarah Goslee sarah.goslee at gmail.com
Sat Dec 31 17:20:48 CET 2011


Hi,

I think you're not understanding quite what's going on with hist. Reread the
help, and take a look at this small example. The solution I'd use is the last
item.

> x <- rep(1:10, times=1:10)
> table(x)
x
 1  2  3  4  5  6  7  8  9 10
 1  2  3  4  5  6  7  8  9 10
>
>
> hist(x, plot=FALSE, right=TRUE)$counts
[1]  3  3  4  5  6  7  8  9 10
> hist(x, plot=FALSE, right=TRUE)$breaks
 [1]  1  2  3  4  5  6  7  8  9 10
> hist(x, plot=FALSE, right=TRUE)$mids
[1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
>
>
> hist(x, plot=FALSE, right=FALSE)$counts
[1]  1  2  3  4  5  6  7  8 19
> hist(x, plot=FALSE, right=FALSE)$breaks
 [1]  1  2  3  4  5  6  7  8  9 10
> hist(x, plot=FALSE, right=FALSE)$mids
[1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
>
>
> hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$counts
 [1]  1  2  3  4  5  6  7  8  9 10
> hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$breaks
 [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5
> hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$mids
 [1]  1  2  3  4  5  6  7  8  9 10


Sarah

On Sat, Dec 31, 2011 at 10:25 AM, Aren Cambre <aren at arencambre.com> wrote:
> I have two large datasets (156K and 2.06M records). Each row has the
> hour that an event happened, represented by an integer from 0 to 23.
>
> R's histogram is combining some data.
>
> Here's the command I ran to get the histogram:
>> histinfo <- hist(crashes$hour, right=FALSE)
>
> Here's histinfo:
>> histinfo
> $breaks
>  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
>
> $counts
>  [1]  4755  4618  5959  3292  2378  2715  4592  6144  6860  5598  5601
>  6596  7152  7490  8166
> [16]  9758 11301 11745  9943  7494  6272  6220 11669
>
> $intensities
>  [1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
> 0.02937602 0.03930449
>  [9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
> 0.05223967 0.06242403
> [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
> 0.07464911
>
> $density
>  [1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
> 0.02937602 0.03930449
>  [9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
> 0.05223967 0.06242403
> [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
> 0.07464911
>
> $mids
>  [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5 11.5 12.5
> 13.5 14.5 15.5 16.5 17.5
> [19] 18.5 19.5 20.5 21.5 22.5
>
> $xname
> [1] "crashes$hour"
>
> $equidist
> [1] TRUE
>
> attr(,"class")
> [1] "histogram"
>
> Note how the last value in counts is 11669. It's relevant to the
> output of table(crashes$hour):
>     0     1     2     3     4     5     6     7     8     9    10
> 11    12    13    14
>  4755  4618  5959  3292  2378  2715  4592  6144  6860  5598  5601
> 6596  7152  7490  8166
>    15    16    17    18    19    20    21    22    23
>  9758 11301 11745  9943  7494  6272  6220  6000  5669
>
> Notice how the sum of 22 and 23 from table(crashes$hour) is 11669? Is
> that correct for the histogram to combine hours 22 and 23? Since I
> specified right = FALSE, I figured there's no way 23 would be combined
> with 22?
>
> Adding breaks=24 to the hist makes no difference; it's still stuck at
> 23 breaks. I also tried breaks=25 and 23 and several other values, in
> case I am misinterpreting breaks's meaning, but none of them make a
> difference.
>
> I imagine this is a n00b question, so my apologies if this is obvious.
>
> Aren
>

-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list