visOmopResults provides plotting tools that simplify
visualising data in <summarised_result>
format while
also being compatible with other formats.
<summarised_result>
For this vignette, we will use the penguins
dataset from
the palmerpenguins package. This dataset will be
summarised using the PatientProfiles::summariseResult()
function, which aggregates the data into the
<summarised_result>
format:
library(PatientProfiles)
library(palmerpenguins)
library(dplyr)
summariseIsland <- function(island) {
penguins |>
filter(.data$island == .env$island) |>
summariseResult(
group = "species",
includeOverallGroup = TRUE,
strata = list("year", "sex", c("year", "sex")),
variables = c(
"bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g",
"sex"),
estimates = c(
"median", "q25", "q75", "min", "max", "count_missing", "count",
"percentage", "density")
) |>
suppressMessages() |>
mutate(cdm_name = island)
}
penguinsSummary <- bind(
summariseIsland("Torgersen"),
summariseIsland("Biscoe"),
summariseIsland("Dream")
)
penguinsSummary |> glimpse()
#> Rows: 429,296
#> Columns: 13
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name <chr> "Torgersen", "Torgersen", "Torgersen", "Torgersen", "…
#> $ group_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ group_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name <chr> "number records", "bill_length_mm", "bill_length_mm",…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, "density_001", "density_0…
#> $ estimate_name <chr> "count", "median", "q25", "q75", "min", "max", "count…
#> $ estimate_type <chr> "integer", "numeric", "numeric", "numeric", "numeric"…
#> $ estimate_value <chr> "52", "38.9", "36.65", "41.1", "33.5", "46", "1", "29…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
<summarised_result>
objects1) Tidy Format
When working with <summarised_result>
objects, the
data is internally converted into the tidy format before plotting. This
is an important distinction because columns such as
strata_name
and strata_level
from the original
<summarised_result>
cannot be used directly with the
plotting functions. Instead, tidy columns should be referenced.
For more information about the tidy format, refer to the
omopgenerics package vignette on
<summarised_result>
here.
To identify the available tidy columns, use the
tidyColumns()
function:
tidyColumns(penguinsSummary)
#> [1] "cdm_name" "species" "year" "sex"
#> [5] "variable_name" "variable_level" "count" "median"
#> [9] "q25" "q75" "min" "max"
#> [13] "count_missing" "density_x" "density_y" "percentage"
2) Subsetting Variables
Before calling the plotting functions, always subset the
<summarised_result>
object to the variable of
interest. Avoid combining results from unrelated variables, as this may
lead to NA values in the tidy format, which can affect your plots.
We can create simple scatter plots using the
plotScatter()
let’s see some examples:
penguinsSummary |>
filter(variable_name == "bill_depth_mm") |>
filterStrata(year != "overall", sex == "overall") |>
scatterPlot(
x = "year",
y = "median",
line = TRUE,
point = TRUE,
ribbon = FALSE,
facet = "cdm_name",
colour = "species"
)
Additionally, we can use the function themeVisOmop()
to
change the default ggplot2
style to our default style. Not
only that, but we can use standard ggplot2 functionalities to the
returned plot:
penguinsSummary |>
filter(variable_name %in% c("bill_length_mm", "bill_depth_mm"))|>
filterStrata(year == "overall", sex == "overall") |>
filterGroup(species != "overall") |>
scatterPlot(
x = "density_x",
y = "density_y",
line = TRUE,
point = FALSE,
ribbon = FALSE,
facet = cdm_name ~ variable_name,
colour = "species"
) +
themeVisOmop() +
ggplot2::facet_grid(cdm_name ~ variable_name, scales = "free_x")
penguinsSummary |>
filter(variable_name == "flipper_length_mm") |>
filterStrata(year != "overall", sex %in% c("female", "male")) |>
scatterPlot(
x = c("year", "sex"),
y = "median",
ymin = "q25",
ymax = "q75",
line = FALSE,
point = TRUE,
ribbon = FALSE,
facet = cdm_name ~ species,
colour = "sex",
group = c("year", "sex")
) +
themeVisOmop() +
ggplot2::coord_flip() +
ggplot2::labs(y = "Flipper length (mm)") +
ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5, hjust=1))
penguinsSummary |>
filter(
variable_name %in% c("flipper_length_mm", "bill_length_mm", "bill_depth_mm")
) |>
filterStrata(sex == "overall") |>
scatterPlot(
x = "year",
y = "median",
ymin = "min",
ymax = "max",
line = FALSE,
point = TRUE,
ribbon = TRUE,
facet = cdm_name ~ species,
colour = "variable_name",
group = c("variable_name")
) +
themeVisOmop(fontsizeRef = 12) +
ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5, hjust=1))
Let’s create a bar plots:
Let’s create some box plots of their body mass:
penguinsSummary |>
filter(variable_name == "body_mass_g") |>
boxPlot(x = "year", facet = species ~ cdm_name, colour = "sex") +
themeVisOmop()
penguinsSummary |>
filter(variable_name == "body_mass_g") |>
filterGroup(species != "overall") |>
filterStrata(sex %in% c("female", "male"), year != "overall") |>
boxPlot(x = "cdm_name", facet = c("sex", "species"), colour = "year") +
themeVisOmop(fontsizeRef = 11)
Note that as we didnt specify x there is no levels in the x axis, but box plots are produced anyway.
<data.frame>
Plotting functions can also be used with the usual
<data.frame>
. In this case we will use the tidy
format of penguinsSummary
.
penguinsTidy <- penguinsSummary |>
filter(!estimate_name %in% c("density_x", "density_y")) |> # remove density for simplicity
tidy()
penguinsTidy |> glimpse()
#> Rows: 720
#> Columns: 14
#> $ cdm_name <chr> "Torgersen", "Torgersen", "Torgersen", "Torgersen", "To…
#> $ species <chr> "overall", "overall", "overall", "overall", "overall", …
#> $ year <chr> "overall", "overall", "overall", "overall", "overall", …
#> $ sex <chr> "overall", "overall", "overall", "overall", "overall", …
#> $ variable_name <chr> "number records", "bill_length_mm", "bill_depth_mm", "f…
#> $ variable_level <chr> NA, NA, NA, NA, NA, "female", "male", NA, NA, NA, NA, N…
#> $ count <int> 52, NA, NA, NA, NA, 24, 23, 5, 20, 16, 16, NA, NA, NA, …
#> $ median <int> NA, 38, 18, 191, 3700, NA, NA, NA, NA, NA, NA, 38, 38, …
#> $ q25 <int> NA, 36, 17, 187, 3338, NA, NA, NA, NA, NA, NA, 37, 35, …
#> $ q75 <int> NA, 41, 19, 195, 4000, NA, NA, NA, NA, NA, NA, 39, 41, …
#> $ min <int> NA, 33, 15, 176, 2900, NA, NA, NA, NA, NA, NA, 34, 33, …
#> $ max <int> NA, 46, 21, 210, 4700, NA, NA, NA, NA, NA, NA, 46, 45, …
#> $ count_missing <int> NA, 1, 1, 1, 1, NA, NA, NA, NA, NA, NA, 1, 0, 0, 1, 0, …
#> $ percentage <dbl> NA, NA, NA, NA, NA, 46.153846, 44.230769, 9.615385, NA,…
Using this tidy format, we can replicate plots. For instance, we recreate the previous example:
The tidy format is very useful to apply any other custom ggplot2 function that we may be interested on:
library(ggplot2)
penguinsSummary |>
filter(variable_name == "number records") |>
tidy() |>
ggplot(aes(x = year, y = sex, fill = count, label = count)) +
themeVisOmop() +
geom_tile() +
scale_fill_viridis_c(trans = "log") +
geom_text() +
facet_grid(cdm_name ~ species) +
ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5, hjust=1))
ggplot2
The plotting functions are a wrapper around the ggplot2 package,
outputs of the plotting functions can be later customised with ggplot2
and similar tools. For example we can use ggplot2::labs()
to change the labels and ggplot2::theme()
to move the
location of the legend.
penguinsSummary |>
filter(
group_level != "overall",
strata_name == "year &&& sex",
!grepl("NA", strata_level),
variable_name == "body_mass_g") |>
boxPlot(x = "species", facet = cdm_name ~ sex, colour = "year") +
themeVisOmop(fontsizeRef = 12) +
ylim(c(0, 6500)) +
labs(x = "My custom x label")
You can also use ggplot2::ggsave()
to later save one of
this plots into ‘.png’ file.
plotly
Although the package currently does not provide any plotly
functionality ggplots can be easily converted to
<plotly>
ones using the function
plotly::ggplotly()
. This can make the interactivity of some
plots better.