[R] ggplot: add percentage for each element in legend and remove tick mark
Avi Gross
@v|gro@@ @end|ng |rom ver|zon@net
Sat Aug 14 03:29:15 CEST 2021
Kai,
It is easier to want to help someone if they generally know what they are doing and are stuck on something. Less so when they do not know enough to explain to us what they want, show what they did, and so on.
I modified the data you showed and hopefully it can be recreated this way:
library(tidyverse)
df <- tribble(
~ethnicity, ~individuals,
"Caucasian", 36062,
"Ashkenazi Jewish", 4309,
"Multiple", 3193,
"Hispanic", 2113,
"Asian. not specified", 1538,
"Chinese", 1031,
"African", 643,
"Unknown", 510,
"Filipino", 222,
"Japanese", 129,
"Native American", 116,
"Indian", 111,
"Pacific Islander", 23)
If it was not clear, assuming you already had your data in some variable with a name, like my df, you could do this:
> dput(df)
structure(list(
ethnicity = c(
"Caucasian",
"Ashkenazi Jewish",
"Multiple",
"Hispanic",
"Asian. not specified",
"Chinese",
"African",
"Unknown",
"Filipino",
"Japanese",
"Native American",
"Indian",
"Pacific Islander"
),
individuals = c(36062, 4309, 3193, 2113,
1538, 1031, 643, 510, 222, 129, 116, 111, 23)
), row.names = c(NA,
-13L), class = c("tbl_df", "tbl", "data.frame"))
The above structure can be used to recreate the data somewhat portably including a cut and paste like this:
Restoring <- the.above.put.here
The question you ask may better be answered by CHANGING what is in df before calling ggplot.
Be that as it may, with lotf of work on your badly formatted code as shown in plain text, I have this:
> eth
# A tibble: 13 x 5
ethnicity individuals fraction ymax ymin
<chr> <dbl> <dbl> <dbl> <dbl>
1 Caucasian 36062 0.721 0.721 0
2 Ashkenazi Jewish 4309 0.0862 0.807 0.721
3 Multiple 3193 0.0639 0.871 0.807
4 Hispanic 2113 0.0423 0.914 0.871
5 Asian. not specified 1538 0.0308 0.944 0.914
6 Chinese 1031 0.0206 0.965 0.944
7 African 643 0.0129 0.978 0.965
8 Unknown 510 0.0102 0.988 0.978
9 Filipino 222 0.00444 0.992 0.988
10 Japanese 129 0.00258 0.995 0.992
11 Native American 116 0.00232 0.997 0.995
12 Indian 111 0.00222 1.00 0.997
13 Pacific Islander 23 0.00046 1 1.00
I used your ggplot code, reformatted so people can read and run it as:
ggplot(eth, aes(ymax=ymax, ymin=ymin, xmax=4, xmin=3, fill=ethnicity)) +
geom_rect() +
coord_polar(theta="y") +
xlim(c(2, 4))
It shows donut plot I am not sure I can easily share here. You want to change the legend by adding more. Sure, tons of ways to do that BUT not sure what you actually want.
ONE WAY to do what you want is to make a new column like this:
> eth$label <- paste(eth$ethnicity, " ", eth$fraction*100, "%", sep="")
> eth
# A tibble: 13 x 6
ethnicity individuals fraction ymax ymin label
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 Caucasian 36062 0.721 0.721 0 Caucasian 72.124%
2 Ashkenazi Jewish 4309 0.0862 0.807 0.721 Ashkenazi Jewish 8.618%
3 Multiple 3193 0.0639 0.871 0.807 Multiple 6.386%
4 Hispanic 2113 0.0423 0.914 0.871 Hispanic 4.226%
5 Asian. not specified 1538 0.0308 0.944 0.914 Asian. not specified 3.076%
6 Chinese 1031 0.0206 0.965 0.944 Chinese 2.062%
7 African 643 0.0129 0.978 0.965 African 1.286%
8 Unknown 510 0.0102 0.988 0.978 Unknown 1.02%
9 Filipino 222 0.00444 0.992 0.988 Filipino 0.444%
10 Japanese 129 0.00258 0.995 0.992 Japanese 0.258%
11 Native American 116 0.00232 0.997 0.995 Native American 0.232%
12 Indian 111 0.00222 1.00 0.997 Indian 0.222%
13 Pacific Islander 23 0.00046 1 1.00 Pacific Islander 0.046%
Now once you make the labels look like the exact way you want, you need to ask ggplot to substitute your labels, and make sure they line up right. It may be tricky and may require making factors properly. You may also want to round the percentages to all be the same. You can also use scale_fill_discrete to change other things like replace "ethnicity" with another phrase and so on.
Here is the additional part of ggplot that makes the change:
ggplot(eth, aes(ymax=ymax, ymin=ymin, xmax=4, xmin=3, fill=ethnicity)) +
geom_rect() +
coord_polar(theta="y") +
xlim(c(2, 4)) +
scale_fill_discrete( labels = eth$label)
Removing the tick mark text can be done by setting the right elements of a theme as in the following:
ggplot(eth, aes(ymax=ymax, ymin=ymin, xmax=4, xmin=3, fill=ethnicity)) +
geom_rect() +
coord_polar(theta="y") +
xlim(c(2, 4)) +
scale_fill_discrete( labels = eth$label) +
theme(axis.ticks = element_blank(),
axis.text = element_blank())
Only one of the two above is actually needed, and you can experiment.
I can send you personally an attachment showing the output as this is a text only setup.
-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Kai Yang via R-help
Sent: Friday, August 13, 2021 5:48 PM
To: John Kane <jrkrideau using gmail.com>
Cc: R-help Mailing List <r-help using r-project.org>
Subject: Re: [R] ggplot: add percentage for each element in legend and remove tick mark
Hello John,
I put my testing data below. I'm not sure how to use dupt() function. would you please give me an example?
Thanks,
Kai
|
ethnicity |
individuals |
| Caucasian | 36062 |
| Ashkenazi Jewish | 4309 |
| Multiple | 3193 |
| Hispanic | 2113 |
| Asian. not specified | 1538 |
| Chinese | 1031 |
| African | 643 |
| Unknown | 510 |
| Filipino | 222 |
| Japanese | 129 |
| Native American | 116 |
| Indian | 111 |
| Pacific Islander | 23 |
On Friday, August 13, 2021, 06:21:29 AM PDT, John Kane <jrkrideau using gmail.com> wrote:
Would you supply some sample data please? A handy way to supply sample data is to use the dput() function. See ?dput. If you have a very large data set then something like head(dput(myfile), 100) will likely supply enough data for us to work with.
On Thu, 12 Aug 2021 at 11:45, Kai Yang via R-help <r-help using r-project.org> wrote:
>
> Hello List,
> I use the following code to generate a donut plot.
> # Compute percentages
> eth$fraction = eth$individuals / sum(eth$individuals) # Compute the
>cumulative percentages (top of each rectangle) eth$ymax =
>cumsum(eth$fraction) # Compute the bottom of each rectangle eth$ymin
>= c(0, head(eth$ymax, n=-1)) # Make the plot using percentage
>ggplot(eth, aes(ymax=ymax, ymin=ymin, xmax=4, xmin=3, fill=ethnicity))
>+
> geom_rect() +
> coord_polar(theta="y") +
> xlim(c(2, 4)
> )
>
> I want to improve the plot for two thing:
> 1. the legend: I need to add percentage (eth$fraction * 100 and then add %) for each of element.
> 2. remove all number (tick mark ?) around the plot Please help Thank
> you, Kai
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
John Kane
Kingston ON Canada
[[alternative HTML version deleted]]
______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list