--- title: "Introduction to CohortSymmetry" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{a01_Introduction} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, eval = Sys.getenv("$RUNNER_OS") != "macOS" ) options(rmarkdown.html_vignette.check_title = FALSE) ``` CohortSymmetry provides tools to perform Sequence Symmetry Analysis (SSA). Before using the package, it is highly recommended that this method is tested beforehand against well-known positive and negative controls. The details of SSA and the relevant controls could be found using Pratt et al (2015). The functions you will interact with are: 1. `generateSequenceCohortSet()`: this function will create a cohort with individuals present in both (the index and the marker) cohorts. 2. `summariseSequenceRatios()`: this function will calculate sequence ratios. 3. `tableSequenceRatios()` and `plotSequenceRatios()`: these functions will help us to visualise the sequence ratio results. 4. `summariseTemporalSymmetry()`: this function will produce aggregated results based on the time difference between two cohort start dates. 5. `plotTemporalSymmetry()`: this function will help us to visualise the results from summariseTemporalSymmetry(). Below, you will find an example analysis that offers a brief and comprehensive overview of the package's functionalities. More context and further examples for each of these functions are provided in later vignettes. First, let’s load the relevant libraries. ```{r message= FALSE, warning=FALSE} library(CDMConnector) library(dplyr) library(DBI) library(omock) library(CohortSymmetry) library(duckdb) ``` The CohortSymmetry package works with data mapped to the OMOP CDM. Hence, the initial step involves connecting to a database. As an example, we will be using Omock package to generate a mock database with two mock cohorts: the **index_cohort** and the **marker_cohort**. ```{r message= FALSE, warning=FALSE} cdm <- emptyCdmReference(cdmName = "mock") |> mockPerson(nPerson = 100) |> mockObservationPeriod() |> mockCohort( name = "index_cohort", numberCohorts = 1, cohortName = c("index_cohort"), seed = 1, ) |> mockCohort( name = "marker_cohort", numberCohorts = 1, cohortName = c("marker_cohort"), seed = 2 ) con <- dbConnect(duckdb::duckdb()) cdm <- copyCdmTo(con = con, cdm = cdm, schema = "main", overwrite = T) ``` Once we have established a connection to the database, we can use the `generateSequenceCohortSet()` function to find the intersection of the two cohorts. This function will provide us with the individuals who appear in both cohorts, which will be named **intersect** - another cohort in the cdm reference. ```{r message= FALSE, warning=FALSE} cdm <- generateSequenceCohortSet( cdm = cdm, indexTable = "index_cohort", markerTable = "marker_cohort", name = "intersect", combinationWindow = c(0, Inf) ) ``` See below that the generated cohort follows the format of an OMOP CDM cohort with the addition of two extra columns: *index_date* and *marker_date*. These columns correspond to the *cohort_start_date* in the **index_cohort** and the **marker_cohort**, respectively. ```{r message= FALSE, warning=FALSE} cdm$intersect |> dplyr::glimpse() ``` Once we have the intersect cohort, you are able to explore the temporal symmetry by using `summariseTemporalSymmetry`, `tableTemporalSymmetry`, and `plotTemporalSymmetry()`: ```{r message= FALSE, warning=FALSE} temporal_symmetry <- summariseTemporalSymmetry( cohort = cdm$intersect, timescale = "year") ``` The result can be viewed using table and plot functions. ```{r message= FALSE, warning=FALSE} tableTemporalSymmetry(result = temporal_symmetry) ``` ```{r message= FALSE, warning=FALSE} plotTemporalSymmetry(result = temporal_symmetry) ``` Next, we will use the `summariseSequenceRatios()` function to get the crude sequence ratios, adjusted sequence ratios, and the corresponding confidence intervals. ```{r message= FALSE, warning=FALSE} sequence_ratio <- summariseSequenceRatios(cohort = cdm$intersect) ``` Finally, we can visualise the results using `tableSequenceRatios()`: ```{r message= FALSE, warning=FALSE} tableSequenceRatios(result = sequence_ratio) ``` Or create a plot with the adjusted sequence ratios: ```{r message= FALSE, warning=FALSE} plotSequenceRatios(result = sequence_ratio) ``` ## As a diagram Diagrammatically, the work flow using CohortSymmetry resembles the following flow chat: ```{r, echo=FALSE, message=FALSE, out.width="100%", warning=FALSE} library(here) knitr::include_graphics(here("vignettes/workflow.png")) ```