--- title: "Basic Workflow" output: rmarkdown::html_vignette: keep_md: true toc: true vignette: > %\VignetteIndexEntry{basic_workflow} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction ## Objectives - Collect drug and adverse drug reaction **IDs** - Perform standard data management in VigiBase ECL - Conduct disproportionality analysis (both univariate and multivariate) If you want to access script templates for these steps, see - `vignette("template_dictionary")` and - `vignette("template_main")` ## Reminder of Database Structure Each table has a unique identifying key and other keys to perform joins. | Table | Key | Other keys | |--------|---------------|------------------| | `demo` | `UMCReportId` | | | `drug` | `Drug_Id` | `UMCReportId` | | `adr` | `Adr_Id` | `UMCReportId` | > Goal of the tutorial: perform a disproportionality analysis between colitis and nivolumab among checkpoint inhibitor cases. ## Step 0: Load Packages ```{r library, warning=FALSE, message=FALSE} library(vigicaen) library(rlang) library(dplyr) ``` # Build tables from source files {#building_tables} This process should be done **once per database version**. You don't have to do it to follow this tutorial, since we will use the package built-in example tables. However, when working on real analyses, you will need to process this step first. See the vignette here `vignette("getting_started")`. # Collecting IDs {#collecting_ids} The whole package relies on defining a dictionary of drugs and adrs of interest. Those collection of terms should be stored in **named lists**. ```{r named_lists} # drug selection d_sel <- list( nivolumab = "nivolumab", ipilimumab = "ipilimumab", nivo_or_ipi = c("nivolumab", "ipilimumab") ) # adverse drug reaction selection a_sel <- list( colitis = "Colitis", pneumonitis = "Pneumonitis" ) ``` As we see, the `d_sel` list contains three named vectors: nivolumab, ipilimumab, and nivo_or_ipi. Each of these vectors can contain one or more names of drugs. > The names of the vectors don't have to be the same as the drug names. > They will be used to created columns later in the process. We will pass these named list selections to *ID collector* functions: The **`get_*`** family. - `get_drecno()` for drugs, `get_atc_code()` for ATC classes - `get_llt_soc()` or `get_llt_smq()` for adverse drug reactions These functions will allow to collect IDs (e.g. codes) matching our drugs and adrs in a specific dictionary. - For drugs, we will use the WHODrug dictionary, and collect Drug Record Numbers (DrecNos) most of the time, or Medicinal Product Ids (MedicinalProd_Ids) in some specific scenarii. - For adrs, we will use the Medical Dictionary for Regulatory Activities (MedDRA), and collect term codes (low-level term codes). Here, it is important to note that we can work with other terms (like Preferred Terms, High Level terms, etc.). The ID collector will just collect low-level term codes of all higher level terms, resulting in a pretty long list of codes, because VigiBase data is structured on low-level term codes. # Data management ## Drugs {#drug_workflow} ### Principle 1. Load the `demo`, `drug`, and `mp` tables. 2. Select one or more medications of interest. 3. Identify the drug codes (e.g., the `DrecNo(s)`) associated with these drug using the `mp` table. 4. Search for cases exposed to these medications using the codes in the `drug` table. 5. Update the `demo` table: code 1 if the case reports the medication of interest, 0 otherwise. 6. Check your data management Step 1 is performed with `dt_parquet()` if you have [your own tables](#building_tables), or using the built-in example tables for this tutorial. Step 2 and 3 can be referred as "dictionary" steps. Step 3 is performed with `get_drecno()` or `get_atc_code()`. Steps 4 and 5 are performed with `add_drug()`. step 6 is performed with `check_dm()`. ### Step 1: Load the tables ```{r table_loading, warning=FALSE, message=FALSE} demo <- demo_ drug <- drug_ mp <- mp_ ``` > Note: if you are working with your own tables, you will need to load them here, with `dt_parquet()`. ### Step 2: Choose drugs of Interest This is probably the most interesting part of the process, from a scientific point of view. But for now, it's pretty trivial. Once you've decided which drugs you would like to study (e.g. nivolumab in this tutorial), you need to create a **named list** of character vectors. ```{r drug_sel} d_sel <- # drug selection list2( nivolumab = "nivolumab" ) d_sel ``` > Remember to use lower case names in d_sel (no capital letters: > Good: "nivolumab", Wrong: "Nivolumab" or "NIVOLUMAB") The ideal way of picking drugs is by using their **WHO name**. WHO name is an international non-proprietary name (INN) for the drug. A few drugs have more than one INN (e.g. paracetamol and acetaminophen), but they still have a unique WHO name (most of the time, one of the INN). The ID collector `get_drecno()` will let you know if you missed the WHO name. Alternatively, you can work with Anatomical and Therapeutical Classification (ATC) codes to investigate a set of drug. This is explained in the [ATC section](#atc_classes). ### Step 3: Identify drug codes The function `get_drecno()` allows you to query the `mp` table with a selection of drug names. It takes several arguments, two of which must be filled in: the selection of drugs and the table containing the correspondence between the name and the code (here, `mp`). You should **always** look carefully at the printed message. 1. Use `get_drecno()` with argument `verbose = TRUE` (default). ```{r verbose_with_get_drecno} get_drecno( d_sel = d_sel, mp = mp_, verbose = TRUE ) ``` We see that `get_drecno()` finds two entries containing the drug "nivolumab" in the `mp` table: - The entry for nivolumab alone - The entry for the ipilimumab;nivolumab combination This second entry was identified because the `allow_combination` argument is set to `TRUE` by default. This allows for a broader identification of all specialties containing nivolumab. In this situation, this behavior is desirable because we want to be sure to identify all cases reporting the drug. 2. Since these are actually the codes we were looking for, we can (optionally) set the `verbose` argument to FALSE and keep the result in an R object called `d_drecno`. ```{r d_drecno} d_drecno <- get_drecno( d_sel = d_sel, mp = mp_, verbose = FALSE ) ``` ### Steps 4 and 5: add_drug() function To identify cases reporting the drug of interest and add the corresponding column to `demo`, we use the `add_drug()` function. The `add_drug()` function takes 3 mandatory arguments: - The dataset on which to add the drug variable(s) (here, `demo`) - A named list containing the codes of the drug(s) - The `drug` table linking drug intake to each case. ```{r add_drug} demo <- add_drug( .data = demo, d_code = d_drecno, drug_data = drug) demo ``` Or, in tidyverse syntax ```{r, results='hide'} demo <- demo |> add_drug( d_code = d_drecno, drug_data = drug ) ``` ### Step 6: Check your data management This may seem trivial, but it is an **essential** step in the construction of a dataset. There are many ways to check that the code has worked. Here, the `check_dm()` function will count the number of rows in the dataset where the desired column is equal to 1. ```{r check_dm} check_dm(demo, "nivolumab") ``` It shows how many rows in `demo` have the value 1 in the `nivolumab` column (e.g. how many cases where identified as reporting on nivolumab reactions). Here, we see that 225 cases report nivolumab. ### Step 2 and 3 variant: ATC classes {#atc_classes} The correspondence between ATC (Anatomical and Therapeutical Classification) classes and drug codes is found in the `thg` table. In this table, drug codes are stored as `MedicinalProd_Id`. It is therefore necessary to make a second correspondence with `mp` to find `DrecNo`. This can be done with the `get_atc_code()` function. As with drugs, we first need to identify the ATC class of interest (here, "L03"). ```{r atc_drecno} atc_sel <- list2(l03 = "L03") atc_drecno <- get_atc_code( atc_sel = atc_sel, mp = mp, thg_data = thg_ ) ``` The `get_atc_code()` function requires the `mp` and `thg` tables, as well as the selection of ATC classes. ```{r str_atc_drecno} str(atc_drecno) ``` By default, this function retrieves DrecNos associated with an ATC class. It is possible to retrieve MedicinalProd_Ids instead by setting the `vigilyze` argument to `FALSE`. The interest of using MedicinalProd_Ids instead of DrecNos is to restrict the drug panel only to packages corresponding to a specific ATC class (e.g., you might not want to find all packages of corticosteroids if you work with the ATC class "S01BA", which corresponds to ophtalmic steroids). Once DrecNos are identified, we can add them to the `demo` table, with the `add_drug()` function. ```{r add_atc} demo |> add_drug( d_code = atc_drecno, drug_data = drug ) ``` ### Step 4 and 5 variant: Suspect, concomitant, interacting We can choose to work with drugs according to their "reputation basis". This information is stored in the `Basis` column of the `drug` table. - 1 suspect - 2 concomitant - 3 interacting By using the `add_drug()` function, we can specify which type of status we are interested in, in the `repbasis` argument. By default, the value `"sci"` indicates that we consider the drug whether it is suspect, concomitant, or interacting. We can change the selection. ```{r add_drug_repbasis} demo |> add_drug( d_code = d_drecno, drug_data = drug, repbasis = "sci" ) |> check_dm("nivolumab") # suspected only demo |> add_drug( d_code = d_drecno, drug_data = drug, repbasis = "s" ) |> check_dm("nivolumab") ``` ### Create multiple drug columns To work with multiple drugs, you need to update the initial `d_sel` list. ```{r many_drugs} d_sel <- list2( nivolumab = "nivolumab", pembrolizumab = "pembrolizumab" ) d_drecno <- d_sel |> get_drecno(mp = mp) demo <- demo |> add_drug( d_drecno, drug_data = drug ) demo |> check_dm(c("nivolumab", "pembrolizumab")) ``` ### Drug groups If you want to work at the level of a group of drugs, but the ATC classes do not match your needs perfectly, you can group them in the `d_sel` list. ```{r d_groups} d_sel <- list2( analgesics = c("paracetamol", "tramadol"), ici = c("nivolumab", "pembrolizumab") ) d_drecno <- d_sel |> get_drecno(mp = mp, allow_combination = FALSE) demo <- demo |> add_drug( d_drecno, drug_data = drug ) demo |> check_dm(names(d_sel)) ``` ## Adverse drug reactions ### Principles 1. Load the `demo`, `adr`, and `meddra` tables. 2. Choose the adverse event(s) of interest. 3. Identify the event codes (these are low-level terms according to the MedDRA classification). They can be found in the `meddra` table or in the `smq` tables. 4. Search for cases that have presented this event, using the codes 5. Update the `demo` table: code 1 if the case reports the event of interest, 0 otherwise. 6. Check your data management Similarly to the [drug workflow](#drug_workflow), steps 2 and 3 can be referred to as "dictionary" steps. Step 3 uses `get_llt_soc()` or `get_llt_smq()`. ### Step 1: Load the tables ```{r load_adr_table} adr <- adr_ meddra <- meddra_ ``` demo was loaded during the [drug workflow](#drug_workflow). ### Step 2: Choose events of interest ```{r a_sel_pt} a_sel_pt <- list2( a_colitis = c( "Colitis", "Autoimmune colitis", "Colitis microscopic", "Diarrhoea", "Diarrhoea haemorrhagic", "Duodenitis", "Enteritis", "Enterocolitis", "Enterocolitis haemorrhagic", "Ulcerative gastritis" ) ) ``` We start with a list of adverse events of interest, grouped altogether under the name "a_colitis". > MedDRA terms always start with a capital letter, be sure to provide > the exact case, e.g. Good : "**C**olitis", Wrong : "colitis" or "COLITIS". Be sure all selected terms belong to the same hierarchical level (preferred term, high level term...) in MedDRA. Here, we use Preferred Terms. ### Step 3: Identify event codes The `get_llt_soc()` function allows you to query the `meddra`. ```{r get_llt_soc} a_llt <- get_llt_soc( term_sel = a_sel_pt, term_level = "pt", meddra = meddra_ ) a_llt ``` An alternative is to use the `get_llt_smq()` function, which allows you to query the `smq` tables. Notice that you collect low level term codes, even if you work with higher level terms, like preferred terms, or high level terms. This is intentional: this list collects all low level term codes composing the higher level term. See [Collecting ID section](#collecting_ids). ### Steps 4 and 5: add_adr() function The `add_adr()` function allows you to identify cases reporting the adverse event of interest and add the corresponding column to `demo`. ```{r add_adr} demo <- add_adr( .data = demo, a_code = a_llt, adr_data = adr) ``` ### Step 6: Check your data management `check_dm()` also works for adr. ```{r check_dm_adr} demo |> check_dm("a_colitis") ``` ## Other variables We may need to create other variables to perform our analysis, for example age and sex in a multivariable analysis. ### Age The `demo` table contains the `AgeGroup` column, which groups ages into categories. You may want to recode it to match you research question ```{r age} demo <- demo |> mutate( age = cut(as.integer(AgeGroup), breaks = c(0,4,5,6,7,8), include.lowest = TRUE, right = TRUE, labels = c("<18", "18-45","45-64", "65-74", "75+")) ) ``` ### Sex The `demo` table contains the `Gender` column, from which you can also create a new sex column (with values 1 for men, 2 for women, and NA otherwise) ```{r sex} demo <- demo |> mutate( sex = ifelse(Gender == "1", 1, ifelse(Gender == "2", 2, NA_real_) ) ) ``` ### Using case_when() The `case_when()` function from the `dplyr` package allows you to manage multiple options in a single function, with a slightly different syntax. ```{r sex_casewhen} demo <- demo |> mutate( sex = case_when(Gender == "1" ~ 1, Gender == "2" ~ 2, TRUE ~ NA_real_) ) ``` More documentation on `case_when()` can be found in the `dplyr` package documentation. You should just remember here that options are evaluated sequentially, from top to bottom. ### Seriousness, death The `out` table contains the `Seriousness` column, which indicates whether the case was serious or not, and whether the patient experienced a fatal issue during his/her follow-up. ```{r serious_death} # ---- Serious ---- #### out <- out_ demo <- demo |> mutate( serious = ifelse( UMCReportId %in% out$UMCReportId, UMCReportId %in% (out |> filter(Serious == "Y") |> pull(UMCReportId) ), NA) ) # ---- Death + outcome availability ---- #### demo <- demo |> mutate(death = ifelse(UMCReportId %in% out$UMCReportId, UMCReportId %in% (out |> filter(Seriousness == "1") |> pull(UMCReportId) ), NA) ) ``` - The `serious` and `death` columns are coded with TRUE/FALSE values in this example. There is no particular reason to prefer it over 1/0 codes. It is just a matter of preference. - The `Seriousness` can have several levels, level 1 being death. (see subsidiary files) # Disproportionality Our `demo` dataset now has a drug column for nivolumab, and an adr column for colitis. We can perform a disproportionality analysis between these two variables. ## Univariate analysis ### Disproportionality metrics Reporting Odds-Ratio (ROR) and Information Component essentially measure the same thing: the disproportionality. `compute_dispro()` computes both of these. - `or` and `or_ci` are the reporting Odds-Ratio and its confidence interval (default: 95%CI). - `ic` and `ic_tail` are the Information Component, and its lower end of credibility interval (default: IC025). ```{r compute_dispro} demo |> compute_dispro( y = "a_colitis", x = "nivolumab" ) ``` ## Advanced modelling, multivariate analysis From this point, it is also possible to run any statistical model including drug and adr parameters, but also potential other variables such as age and sex. For example, one could wish to perform a multivariate logistic regression on the reporting of colitis and nivolumab, adjusted on age groups and sex. The `glm()` function from the `stats` package can be used for this purpose. ```{r mod} mod <- glm(a_colitis ~ nivolumab, data = demo, family = "binomial") summary(mod) ``` In a logistic regression models, estimates lead to (reporting) OR by the exponential. ```{r} summary(mod)$coefficients exp(summary(mod)$coefficients[2, 1]) ``` Adding covariates is straightforward ```{r mod_covar} mod2 <- glm(a_colitis ~ nivolumab + sex + age, data = demo, family = "binomial") summary(mod2) ``` ### Extract Odds-Ratio with compute_or_mod() There are several packages that can extract the OR from a model. The `compute_or_mod()` function is just one of many ways to do it. ```{r compute_or_mod} mod_or <- compute_or_mod( summary(mod2)$coefficients, estimate = Estimate, std_er = Std..Error ) mod_or ``` # Conclusion You're now all set to create drugs and adrs columns into a `demo` dataset. This is the first step to many modelling possibilities! Where do you want to go next? - Dive into descriptive features, such as time to onset, dechallenge, rechallenge, screening of drugs and adrs. `vignette("descriptive")` - Learn on interactions in pharmacovigilance database `vignette("interactions")`