--- title: "Quick Start" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Quick Start} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Quick Start for gimap For more background on gimap and the calculations done here, [read here](https://github.com/FredHutch/gimap/blob/main/README.md) ```{r echo = FALSE, results = 'hide'} library(gimap) ``` ```{r echo = FALSE, results = 'hide'} library(dplyr) ``` ## Set Up First we can create a folder we will save files to. ``` output_dir <- "output_timepoints" dir.create(output_dir, showWarnings = FALSE) ``` ```{r eval = FALSE} example_data <- get_example_data("count") ``` ## Setting up data We're going to set up three datasets that we will provide to the `set_up()` function to create a `gimap` dataset object. - `counts` - the counts generated from pgPEN - `pg_ids` - the IDs that correspond to the rows of the counts and specify the construct - `sample_metadata` - metadata that describes the columns of the counts including their timepoints ```{r eval = FALSE} counts <- example_data %>% select(c("Day00_RepA", "Day05_RepA", "Day22_RepA", "Day22_RepB", "Day22_RepC")) %>% as.matrix() ``` `pg_id` are just the unique IDs listed in the same order/sorted the same way as the count data. ```{r eval = FALSE} pg_ids <- example_data %>% dplyr::select("id") ``` Sample metadata is the information that describes the samples and is sorted the same order as the columns in the count data. ```{r eval = FALSE} sample_metadata <- data.frame( col_names = c("Day00_RepA", "Day05_RepA", "Day22_RepA", "Day22_RepB", "Day22_RepC"), day = as.numeric(c("0", "5", "22", "22", "22")), rep = as.factor(c("RepA", "RepA", "RepA", "RepB", "RepC")) ) ``` We'll need to provide `example_counts`, `pg_ids` and `sample_metadata` to `setup_data()`. ```{r eval = FALSE} gimap_dataset <- setup_data( counts = counts, pg_ids = pg_ids, sample_metadata = sample_metadata ) ``` It's ideal to run quality checks first. The `run_qc()` function will create a report we can look at to assess this. ```{r eval = FALSE} run_qc(gimap_dataset, output_file = "example_qc_report.Rmd", overwrite = TRUE, quiet = TRUE) ``` You can take a look at an [example QC report here](http://htmlpreview.github.io/?https://raw.githubusercontent.com/FredHutch/gimap/main/inst/example_qc_report.html). ```{r eval = FALSE} gimap_dataset <- gimap_dataset %>% gimap_filter() %>% gimap_annotate(cell_line = "HELA") %>% gimap_normalize( timepoints = "day" ) %>% calc_gi() ``` ## Example output Genetic interaction is calculated by: - `rep` - indicates which sample from the original the data is from. Note the pretreatment is used for calculation and its statistics are not reported. - `pgRNA_target` - what gene(s) were targeted by this the original pgRNAs for these data - `mean_expected_cs` - the average expected genetic interaction score - `mean_gi_score` - the average observer genetic interaction score - `target_type` - describes whether the CRISPR design is targeting two genes ("gene_gene"), or a gene and a non targeting control ("gene_ctrl") or a targeting control and a gene ("ctrl_gene"). - `p_val` - p values from the testing whether a double knockout construct is significantly different in its genetic interaction score from single targets. - `fdr` - False discovery rate corrected p values ```{r eval = FALSE} gimap_dataset$gi_scores %>% dplyr::arrange(fdr) %>% head() %>% knitr::kable(format = "html") ``` ## Plot the results You can remove any samples from these plots by altering the `reps_to_drop` argument. ```{r eval = FALSE} plot_exp_v_obs_scatter(gimap_dataset, reps_to_drop = "Day05_RepA_early") ``` ```{r eval = FALSE} plot_rank_scatter(gimap_dataset, reps_to_drop = "Day05_RepA_early") ``` ```{r eval = FALSE} plot_volcano(gimap_dataset, reps_to_drop = "Day05_RepA_early", facet_rep = FALSE) ``` Here's how you can save plots like the above. ``` ggplot2::ggsave("volcano_plot.png") ``` ### Plot specific target pair We can pick out a specific pair to plot. ```{r eval = FALSE} # "NDEL1_NDE1" is top result so let's plot that plot_targets_bar(gimap_dataset, target1 = "NDEL1", target2 = "NDE1") ``` ## Saving data to a file We can save all these data as an RDS or the genetic interaction scores themselves to a tsv file. ``` saveRDS(gimap_dataset, "gimap_dataset_final.RDS") ``` ``` readr::write_tsv(gimap_dataset$gi_scores, "gi_scores.tsv") ``` ## Session Info This is just for provenance purposes. ```{r eval = FALSE} sessionInfo() ```