[R] Histogram omitting/collapsing groups

Aren Cambre aren at arencambre.com
Sun Jan 1 23:36:45 CET 2012


This is helpful, although I can't seem to adapt it to my own data.

If I run your sample as is, I do get the nice graphs.

However, this doesn't work:
(Assume you already have a data frame "dallas" with 2057980 rows. It
has column "offense_hour", and each row has a value between 0 and 23,
inclusive.)
> p <- ggplot(as.data.frame(table(dallas$offense_hour)), aes(x = dallas$offense_hour, y = Freq)) + geom_bar()
> print(p)
Error in data.frame(x = c(9, 8, 10, 9, 10, 15, 11, 13, 0, 16, 13, 20,  :
  arguments imply differing number of rows: 2057980, 24

Seems like dallas$offense_hour corresponds to x in your example. I'm
confused why yours works even though your x has 10,000 values, yet
mine fails complaining that the row count is way off. Either way, the
length of x or dallas$offense_hour grossly exceeds 24.

Aren

On Sun, Jan 1, 2012 at 10:34 AM, Joshua Wiley <jwiley.psych at gmail.com> wrote:
>
> Hi Aren,
>
> I was busy thinking about how to make what you wanted, and I missed
> that you were working with hours from a day.  That being the case, you
> may think about a circular graph.  The attached plots show two
> different ways of working with the same data.
>
> Cheers,
>
> Josh
>
> set.seed(10)
> x <- sample(0:23, 10000, TRUE, prob = sin(0:23)+1)
>
> require(ggplot2) # graphing package
>
> ## regular barplot
> p <- ggplot(as.data.frame(table(x)), aes(x = x, y = Freq)) +
>  geom_bar()
>
> ## using circular coordinates
> p2 <- p + coord_polar()
>
> ## print them
> print(p)
> print(p2)
>
>
> ## just if you're interested, the code to
> ## put the two plots side by side
> require(grid)
>
> dev.new(height = 6, width = 12)
> grid.newpage()
> pushViewport(vpList(
>  viewport(x = 0, width = .5,  just = "left", name = "barplot"),
>  viewport(x = .5, width = .5, just = "left", name="windrose")))
> seekViewport("barplot")
> grid.draw(ggplotGrob(p))
> seekViewport("windrose")
> grid.draw(ggplotGrob(p2))
>
>
> On Sun, Jan 1, 2012 at 7:59 AM, Aren Cambre <aren at arencambre.com> wrote:
> > On Sun, Jan 1, 2012 at 5:29 AM, peter dalgaard <pdalgd at gmail.com> wrote:
> >> Exactly. If what you want is a barplot, make a barplot; histograms are for continuous data.   Just remember that you may need to set the levels explicitly in case of empty groups: barplot(table(factor(x,levels=0:23))). (This is irrelevant with 100K data samples, but not with 100 of them).
> >>
> >> That being said, the fact that hist() tends to create breakpoints which coincide with data points due to discretization is arguably a bit of a design error, but it is age-old and hard to change now. One way out is to use truehist() from MASS, another is to explicitly set the breaks to intermediate values, as in hist(x, breaks=seq(-.5, 23.5, 1))
> >
> > Thanks, everybody. I'll definitely switch to barplot.
> >
> > As for continuous, it's all relative. Even the most continuous dataset
> > at a scale that looks pretty to humans may have gaps between the
> > values when you "zoom in" a lot.
> >
> > Aren
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> Programmer Analyst II, Statistical Consulting Group
> University of California, Los Angeles
> https://joshuawiley.com/



More information about the R-help mailing list