[R] Constructing stacked bar plot

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Mon Jun 28 07:47:10 CEST 2021


Hello,

Something like this?


# count number of medals awarded to each Team
medal_counts_ctry <- medals %>%
   na.omit() %>%
   count(region, Medal, name = "Count")

#head(medal_counts_ctry)

# order Team by total medal count
levs_medal <- medal_counts_ctry %>%
   group_by(region) %>%
   summarize(Total = sum(Count)) %>%
   arrange(desc(Total)) %>%
   pull(region)

medal_counts_ctry$region <- factor(medal_counts_ctry$region,
                                    levels = levs_medal)
# keep top 50 medal counts
top_count <- 50
medal_data <- medal_counts_ctry %>%
   slice_max(order_by = Count, n = top_count)



Hope this helps,

Rui Barradas

Às 17:10 de 27/06/21, Jeff Reichman escreveu:
> R-help Forum
> 
> I am attempting to create a stacked bar chart but I have to many categories.
> The following code works and I end up plotting all 134 countries but really
> only need (say) the top 50 or so.
> 
> I am trying to figure out how to further filter out the countries with the
> largest total medal counts to plot. The bolded red code is the point where I
> am thinking is the point where I would do this . I've tried several
> different methods but to no avail. Any suggestions?
> 
> 
> # Load data file matching NOCs with mao regions (countries)
> noc <- read_csv("~/NGA_Files/JuneMakeoverMonday/noc_regions.csv",
>                  col_types = cols(
>                    NOC = col_character(),
>                    region = col_character()
>                  ))
> 
> # Add regions to data and remove missing points
> data_regions <- data %>%
>    left_join(noc,by="NOC") %>%
>    filter(!is.na(region))
> 
> # Subset to variables of interest
> medals <- data_regions %>%
>    select(region, Medal)
> 
> # count number of medals awarded to each Team
> medal_counts_ctry <- medals %>% filter(!is.na(Medal))%>%
>    group_by(region, Medal) %>%
>    summarize(Count=length(Medal))
> 
> #head(medal_counts_ctry)
> 
> # order Team by total medal count
> levs_medal <- medal_counts_ctry %>%
>    group_by(region) %>%
>    summarize(Total=sum(Count)) %>%
>    arrange(desc(Total))
> 
> medal_counts_ctry$region <- factor(medal_counts_ctry$region,
> levels=levs_medal$region)
> 
> medal_data <- medal_counts_ctry %>% filter(medal_counts_ctry$.rows > 100)
> 
> # plot
> ggplot(medal_data, aes(x=region, y=Count, fill=Medal)) +
>    geom_col() +
>    coord_flip() +
>    scale_fill_manual(values=c("darkorange3","darkgoldenrod1","cornsilk3")) +
>    ggtitle("Historical medal counts from Country Teams") +
>    theme(plot.title = element_text(hjust = 0.5))
> 
> 
>> str(medal_counts_ctry)
> grouped_df [323 x 3] (S3: grouped_df/tbl_df/tbl/data.frame)
>   $ region: Factor w/ 134 levels "USA","Russia",..: 101 70 70 70 29 29 29 73
> 73 73 ...
>   $ Medal : Factor w/ 3 levels "Bronze","Gold",..: 1 1 2 3 1 2 3 1 2 3 ...
>   $ Count : int [1:323] 2 8 5 4 91 91 92 9 2 5 ...
>   - attr(*, "groups")= tibble [134 x 2] (S3: tbl_df/tbl/data.frame)
>    ..$ region: Factor w/ 134 levels "USA","Russia",..: 1 2 3 4 5 6 7 8 9 10
> ...
>    ..$ .rows : list<int> [1:134]
>    .. ..$ : int [1:3] 307 308 309
>    .. ..$ : int [1:3] 235 236 237
>    .. ..$ : int [1:3] 102 103 104
>    .. ..$ : int [1:3] 296 297 298
>    .. ..$ : int [1:3] 95 96 97
>    .. ..$ : int [1:3] 138 139 140
>    .. ..$ : int [1:3] 263 264 265
>    .. ..$ : int [1:3] 46 47 48
>    .. ..$ : int [1:3] 11 12 13
>    .. ..$ : int [1:3] 117 118 119
>    .. ..$ : int [1:3] 194 195 196# count number of medals awarded to each Team
medal_counts_ctry <- medals %>%
   na.omit() %>%
   count(region, Medal, name = "Count")

#head(medal_counts_ctry)

# order Team by total medal count
levs_medal <- medal_counts_ctry %>%
   group_by(region) %>%
   summarize(Total = sum(Count)) %>%
   arrange(desc(Total)) %>%
   pull(region)

medal_counts_ctry$region <- factor(medal_counts_ctry$region,
                                    levels = levs_medal)
# keep top 50 medal counts
top_count <- 50
medal_data <- medal_counts_ctry %>%
   slice_max(order_by = Count, n = top_count)

>    .. ..$ : int [1:3] 208 209 210
>    .. ..$ : int [1:3] 52 53 54# count number of medals awarded to each Team
medal_counts_ctry <- medals %>%
   na.omit() %>%
   count(region, Medal, name = "Count")

#head(medal_counts_ctry)

# order Team by total medal count
levs_medal <- medal_counts_ctry %>%
   group_by(region) %>%
   summarize(Total = sum(Count)) %>%
   arrange(desc(Total)) %>%
   pull(region)

medal_counts_ctry$region <- factor(medal_counts_ctry$region,
                                    levels = levs_medal)
# keep top 50 medal counts
top_count <- 50
medal_data <- medal_counts_ctry %>%
   slice_max(order_by = Count, n = top_count)

>    .. ..$ : int [1:3] 147 148 149
>    .. ..$ : int [1:3] 92 93 94
>    .. ..$ : int [1:3] 266 267 268
>    .. ..$ : int [1:3] 232 233 234
>    .. ..$ : int [1:3] 69 70 71
>    .. ..$ : int [1:3] 253 254 255 ..........
> 
> Jeff Reichman
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list