[R] Including percentage values inside columns of a histogram
Bert Gunter
bgunter@4567 @end|ng |rom gm@||@com
Tue Aug 17 22:58:10 CEST 2021
Ah yes. Duhhh... Thanks Rui.
So h$density *diff(h$breaks) *100 will give the percentages. No need
for arithmetic beyond that.
Bert
On Tue, Aug 17, 2021 at 12:03 PM Rui Barradas <ruipbarradas using sapo.pt> wrote:
>
> Hello,
>
>
>
> Às 19:28 de 17/08/21, Bert Gunter escreveu:
> > Inline below.
> >
> >
> >
> > On Tue, Aug 17, 2021 at 4:09 AM Rui Barradas <ruipbarradas using sapo.pt> wrote:
> >>
> >> Hello,
> >>
> >> I had forgotten about plot.histogram, it does make everything simpler.
> >> To have percentages on the bars, in the code below I use package scales.
> >>
> >> Note that it seems to me that you do not want densities, to have
> >> percentages, the proportions of counts are given by any of
> >
> > Under the default of equal width bins -- which is what Sturges gives
>
> Right.
>
> > if I read the docs correctly -- since the densities sum to 1,
>
> The "densities" do not sum to 1. From ?hist, section Value:
>
> density
> values f^(x[i]), as estimated density values. If all(diff(breaks) == 1),
> they are the relative frequencies counts/n and in general satisfy
> sum[i; f^(x[i]) (b[i+1]-b[i])] = 1, where b[i] = breaks[i].
>
>
> If all(diff(breaks) == 1) is FALSE, the density list member must be
> multiplied by diff(.$breaks)
>
>
> h <- hist(datasetregs$Amount, plot = FALSE)
> sum(h$density)
> #[1] 1e-04
> diff(h$breaks)
> #[1] 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000
> sum(h$density*diff(h$breaks))
> #[1] 1
>
>
> Hope this helps,
>
> Rui Barradas
>
> they are
> > already the proportion of counts in each histogram bin, no?
> >
> > -- Bert
> >
> >
> >>
> >> h$counts/sum(h$counts)
> >> h$density*diff(h$breaks)
> >>
> >>
> >>
> >> # One histogram for all dates
> >> h <- hist(datasetregs$Amount, plot = FALSE)
> >> plot(h, labels = scales::percent(h$counts/sum(h$counts)),
> >> ylim = c(0, 1.1*max(h$counts)))
> >>
> >>
> >>
> >> # Histograms by date
> >> sp <- split(datasetregs, datasetregs$Date)
> >> old_par <- par(mfrow = c(1, 3))
> >> h_list <- lapply(seq_along(sp), function(i){
> >> hist_title <- paste("Histogram of", names(sp)[i])
> >> h <- hist(sp[[i]]$Amount, plot = FALSE)
> >> plot(h, main = hist_title, xlab = "Amount",
> >> labels = scales::percent(h$counts/sum(h$counts)),
> >> ylim = c(0, 1.1*max(h$counts)))
> >> })
> >> par(old_par)
> >>
> >>
> >> Hope this helps,
> >>
> >> Rui Barradas
> >>
> >> Às 01:49 de 17/08/21, Bert Gunter escreveu:
> >>> I may well misunderstand, but proffered solutions seem more complicated
> >>> than necessary.
> >>> Note that the return of hist() can be saved as a list of class "histogram"
> >>> and then plotted with plot.histogram(), which already has a "labels"
> >>> argument that seems to be what you want. A simple example is"
> >>>
> >>> dat <- runif(50, 0, 10)
> >>> myhist <- hist(dat, freq = TRUE, breaks ="Sturges")
> >>>
> >>> plot(myhist, col = "darkgray",
> >>> labels = as.character(round(myhist$density*100,1) ),
> >>> ylim = c(0, 1.1*max(myhist$counts)))
> >>> ## note that this is plot.histogram because myhist has class "histogram"
> >>>
> >>> Note that I expanded the y axis a bit to be sure to include the labels. You
> >>> can, of course, plot your separate years as Rui has indicated or via e.g.
> >>> ?layout.
> >>>
> >>> Apologies if I have misunderstood. Just ignore this in that case.
> >>> Otherwise, I leave it to you to fill in details.
> >>>
> >>> Bert Gunter
> >>>
> >>> "The trouble with having an open mind is that people keep coming along and
> >>> sticking things into it."
> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>>
> >>>
> >>> On Mon, Aug 16, 2021 at 4:14 PM Paul Bernal <paulbernal07 using gmail.com> wrote:
> >>>
> >>>> Dear Jim,
> >>>>
> >>>> Thank you so much for your kind reply. Yes, this is what I am looking for,
> >>>> however, can´t see clearly how the bars correspond to the bins in the
> >>>> x-axis. Maybe there is a way to align the amounts so that they match the
> >>>> columns, sorry if I sound picky, but just want to learn if there is a way
> >>>> to accomplish this.
> >>>>
> >>>> Best regards,
> >>>>
> >>>> Paul
> >>>>
> >>>> El lun, 16 ago 2021 a las 17:57, Jim Lemon (<drjimlemon using gmail.com>)
> >>>> escribió:
> >>>>
> >>>>> Hi Paul,
> >>>>> I just worked out your first request:
> >>>>>
> >>>>> datasetregs<-<-structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
> >>>>> 2L,
> >>>>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> >>>>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> >>>>> 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> >>>>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> >>>>> 3L, 3L, 3L), .Label = c("AF 2017", "AF 2020", "AF 2021"), class =
> >>>>> "factor"),
> >>>>> Amount = c(40100, 101100, 35000, 40100, 15000, 45100, 40200,
> >>>>> 15000, 35000, 35100, 20300, 40100, 15000, 67100, 17100, 15000,
> >>>>> 15000, 50100, 35100, 15000, 15000, 15000, 15000, 15000, 15000,
> >>>>> 15000, 15000, 15000, 15000, 15000, 15000, 15000, 15000, 15000,
> >>>>> 15000, 15000, 20100, 15000, 15000, 15000, 15000, 15000, 15000,
> >>>>> 16600, 15000, 15000, 15700, 15000, 15000, 15000, 15000, 15000,
> >>>>> 15000, 15000, 15000, 15000, 20200, 21400, 25100, 15000, 15000,
> >>>>> 15000, 15000, 15000, 15000, 25600, 15000, 15000, 15000, 15000,
> >>>>> 15000, 15000, 15000, 15000)), row.names = c(NA, -74L), class =
> >>>>> "data.frame")
> >>>>> histval<-with(datasetregs, hist(Amount, groups=Date, scale="frequency",
> >>>>> breaks="Sturges", col="darkgray"))
> >>>>> library(plotrix)
> >>>>> histpcts<-paste0(round(100*histval$counts/sum(histval$counts),1),"%")
> >>>>> barlabels(histval$mids,histval$counts,histpcts)
> >>>>>
> >>>>> I think that's what you asked for:
> >>>>>
> >>>>> Jim
> >>>>>
> >>>>> On Tue, Aug 17, 2021 at 8:44 AM Paul Bernal <paulbernal07 using gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> This is way better, now, how could I put the frequency labels in the
> >>>>>> columns as a percentage, instead of presenting them as counts?
> >>>>>>
> >>>>>> Thank you so much.
> >>>>>>
> >>>>>> Paul
> >>>>>>
> >>>>>> El lun, 16 ago 2021 a las 17:33, Rui Barradas (<ruipbarradas using sapo.pt>)
> >>>>>> escribió:
> >>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> You forgot to cc the list.
> >>>>>>>
> >>>>>>> Here are two ways, both of them apply hist() and text() to Amount
> >>>> split
> >>>>>>> by Date. The return value of hist is saved because it's a list with
> >>>>>>> members the histogram's bars midpoints and the counts. Those are used
> >>>>> to
> >>>>>>> know where to put the text labels.
> >>>>>>> A vector lbls is created to get rid of counts of zero.
> >>>>>>>
> >>>>>>> The main difference between the two ways is the histogram's titles.
> >>>>>>>
> >>>>>>>
> >>>>>>> old_par <- par(mfrow = c(1, 3))
> >>>>>>> h_list <- with(datasetregs, tapply(Amount, Date, function(x){
> >>>>>>> h <- hist(x)
> >>>>>>> lbls <- ifelse(h$counts == 0, NA_integer_, h$counts)
> >>>>>>> text(h$mids, h$counts/2, labels = lbls)
> >>>>>>> }))
> >>>>>>> par(old_par)
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> old_par <- par(mfrow = c(1, 3))
> >>>>>>> sp <- split(datasetregs, datasetregs$Date)
> >>>>>>> h_list <- lapply(seq_along(sp), function(i){
> >>>>>>> hist_title <- paste("Histogram of", names(sp)[i])
> >>>>>>> h <- hist(sp[[i]]$Amount, main = hist_title)
> >>>>>>> lbls <- ifelse(h$counts == 0, NA_integer_, h$counts)
> >>>>>>> text(h$mids, h$counts/2, labels = lbls)
> >>>>>>> })
> >>>>>>> par(old_par)
> >>>>>>>
> >>>>>>>
> >>>>>>> Hope this helps,
> >>>>>>>
> >>>>>>> Rui Barradas
> >>>>>>>
> >>>>>>> Às 23:16 de 16/08/21, Paul Bernal escreveu:
> >>>>>>>> Dear Rui,
> >>>>>>>>
> >>>>>>>> The hist() function comes from the graphics package, from what I
> >>>>> could
> >>>>>>>> see. The thing is that I want to divide the Amount column into
> >>>>> several
> >>>>>>>> bins and then generate three different histograms, one for each AF
> >>>>>>>> period (AF refers to fiscal years). As you can see, the data
> >>>> contains
> >>>>>>>> three fiscal years (2017, 2020 and 2021). I want to see the
> >>>>> percentage
> >>>>>>>> of cases that fall into different amount categories, from 15,000
> >>>> and
> >>>>>>>> below, 16,000 to 17,000, from 18,000 to 19,000, and so on.
> >>>>>>>>
> >>>>>>>> Thanks for your kind help.
> >>>>>>>>
> >>>>>>>> Paul
> >>>>>>>>
> >>>>>>>> El lun, 16 ago 2021 a las 17:07, Rui Barradas (<
> >>>> ruipbarradas using sapo.pt
> >>>>>>>> <mailto:ruipbarradas using sapo.pt>>) escribió:
> >>>>>>>>
> >>>>>>>> Hello,
> >>>>>>>>
> >>>>>>>> The function Hist comes from what package?
> >>>>>>>>
> >>>>>>>> Are you sure you don't want a bar plot?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> agg <- aggregate(Amount ~ Date, datasetregs, sum)
> >>>>>>>> bp <- barplot(Amount ~ Date, agg)
> >>>>>>>> with(agg, text(bp, Amount/2, labels = Amount))
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Hope this helps,
> >>>>>>>>
> >>>>>>>> Rui Barradas
> >>>>>>>>
> >>>>>>>> Às 22:54 de 16/08/21, Paul Bernal escreveu:
> >>>>>>>> > Hello everyone,
> >>>>>>>> >
> >>>>>>>> > I am currently working with R version 4.1.0 and I am trying
> >>>> to
> >>>>>>>> include
> >>>>>>>> > (inside the columns of the histogram), the percentage
> >>>>>>>> distribution and I
> >>>>>>>> > want to generate three histograms, one for each fiscal year
> >>>>> (in
> >>>>>>>> the Date
> >>>>>>>> > column, there are three fiscal year AF 2017, AF 2020 and AF
> >>>>>>>> 2021). However,
> >>>>>>>> > I can´t seem to accomplish this.
> >>>>>>>> >
> >>>>>>>> > Here is my data:
> >>>>>>>> >
> >>>>>>>> > structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
> >>>> 2L,
> >>>>>>>> > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> >>>>> 2L,
> >>>>>>>> > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> >>>>> 2L,
> >>>>>>>> > 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> >>>>> 3L,
> >>>>>>>> > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> >>>>> 3L,
> >>>>>>>> > 3L, 3L, 3L), .Label = c("AF 2017", "AF 2020", "AF 2021"),
> >>>>> class =
> >>>>>>>> > "factor"),
> >>>>>>>> > Amount = c(40100, 101100, 35000, 40100, 15000, 45100,
> >>>>> 40200,
> >>>>>>>> > 15000, 35000, 35100, 20300, 40100, 15000, 67100, 17100,
> >>>>>>> 15000,
> >>>>>>>> > 15000, 50100, 35100, 15000, 15000, 15000, 15000, 15000,
> >>>>>>> 15000,
> >>>>>>>> > 15000, 15000, 15000, 15000, 15000, 15000, 15000, 15000,
> >>>>>>> 15000,
> >>>>>>>> > 15000, 15000, 20100, 15000, 15000, 15000, 15000, 15000,
> >>>>>>> 15000,
> >>>>>>>> > 16600, 15000, 15000, 15700, 15000, 15000, 15000, 15000,
> >>>>>>> 15000,
> >>>>>>>> > 15000, 15000, 15000, 15000, 20200, 21400, 25100, 15000,
> >>>>>>> 15000,
> >>>>>>>> > 15000, 15000, 15000, 15000, 25600, 15000, 15000, 15000,
> >>>>>>> 15000,
> >>>>>>>> > 15000, 15000, 15000, 15000)), row.names = c(NA, -74L),
> >>>>> class
> >>>>>>> =
> >>>>>>>> > "data.frame")
> >>>>>>>> >
> >>>>>>>> > I would like to modify the following script:
> >>>>>>>> >
> >>>>>>>> >> with(datasetregs, Hist(Amount, groups=Date,
> >>>>> scale="frequency",
> >>>>>>>> > + breaks="Sturges", col="darkgray"))
> >>>>>>>> >
> >>>>>>>> > #The only thing missing here are the percentages
> >>>>> corresponding to
> >>>>>>>> each bin
> >>>>>>>> > (I would like to see the percentages inside each column, or
> >>>> on
> >>>>>>>> top outside
> >>>>>>>> > if possible)
> >>>>>>>> >
> >>>>>>>> > Any help will be greatly appreciated.
> >>>>>>>> >
> >>>>>>>> > Best regards,
> >>>>>>>> >
> >>>>>>>> > Paul.
> >>>>>>>> >
> >>>>>>>> > [[alternative HTML version deleted]]
> >>>>>>>> >
> >>>>>>>> > ______________________________________________
> >>>>>>>> > R-help using r-project.org <mailto:R-help using r-project.org> mailing
> >>>>> list
> >>>>>>>> -- To UNSUBSCRIBE and more, see
> >>>>>>>> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-help>
> >>>>>>>> > PLEASE do read the posting guide
> >>>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>>> <http://www.R-project.org/posting-guide.html>
> >>>>>>>> > and provide commented, minimal, self-contained, reproducible
> >>>>> code.
> >>>>>>>> >
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> [[alternative HTML version deleted]]
> >>>>>>
> >>>>>> ______________________________________________
> >>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>> PLEASE do read the posting guide
> >>>>> http://www.R-project.org/posting-guide.html
> >>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>>
> >>>>
> >>>> [[alternative HTML version deleted]]
> >>>>
> >>>> ______________________________________________
> >>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>
> >>> [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
More information about the R-help
mailing list