---
title: "Quick Start"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Quick Start}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
# Quick Start for gimap
For more background on gimap and the calculations done here, [read here](https://github.com/FredHutch/gimap/blob/main/README.md)
```{r echo = FALSE, results = 'hide'}
library(gimap)
```
```{r echo = FALSE, results = 'hide'}
library(dplyr)
```
## Set Up
First we can create a folder we will save files to.
```
output_dir <- "output_timepoints"
dir.create(output_dir, showWarnings = FALSE)
```
```{r eval = FALSE}
example_data <- get_example_data("count")
```
## Setting up data
We're going to set up three datasets that we will provide to the `set_up()` function to create a `gimap` dataset object.
- `counts` - the counts generated from pgPEN
- `pg_ids` - the IDs that correspond to the rows of the counts and specify the construct
- `sample_metadata` - metadata that describes the columns of the counts including their timepoints
```{r eval = FALSE}
counts <- example_data %>%
select(c("Day00_RepA", "Day05_RepA", "Day22_RepA", "Day22_RepB", "Day22_RepC")) %>%
as.matrix()
```
`pg_id` are just the unique IDs listed in the same order/sorted the same way as the count data.
```{r eval = FALSE}
pg_ids <- example_data %>%
dplyr::select("id")
```
Sample metadata is the information that describes the samples and is sorted the same order as the columns in the count data.
```{r eval = FALSE}
sample_metadata <- data.frame(
col_names = c("Day00_RepA", "Day05_RepA", "Day22_RepA", "Day22_RepB", "Day22_RepC"),
day = as.numeric(c("0", "5", "22", "22", "22")),
rep = as.factor(c("RepA", "RepA", "RepA", "RepB", "RepC"))
)
```
We'll need to provide `example_counts`, `pg_ids` and `sample_metadata` to `setup_data()`.
```{r eval = FALSE}
gimap_dataset <- setup_data(
counts = counts,
pg_ids = pg_ids,
sample_metadata = sample_metadata
)
```
It's ideal to run quality checks first. The `run_qc()` function will create a report we can look at to assess this.
```{r eval = FALSE}
run_qc(gimap_dataset,
output_file = "example_qc_report.Rmd",
overwrite = TRUE,
quiet = TRUE)
```
You can take a look at an [example QC report here](http://htmlpreview.github.io/?https://raw.githubusercontent.com/FredHutch/gimap/main/inst/example_qc_report.html).
```{r eval = FALSE}
gimap_dataset <- gimap_dataset %>%
gimap_filter() %>%
gimap_annotate(cell_line = "HELA") %>%
gimap_normalize(
timepoints = "day"
) %>%
calc_gi()
```
## Example output
Genetic interaction is calculated by:
- `rep` - indicates which sample from the original the data is from. Note the pretreatment is used for calculation and its statistics are not reported.
- `pgRNA_target` - what gene(s) were targeted by this the original pgRNAs for these data
- `mean_expected_cs` - the average expected genetic interaction score
- `mean_gi_score` - the average observer genetic interaction score
- `target_type` - describes whether the CRISPR design is targeting two genes ("gene_gene"), or a gene and a non targeting control ("gene_ctrl") or a targeting control and a gene ("ctrl_gene").
- `p_val` - p values from the testing whether a double knockout construct is significantly different in its genetic interaction score from single targets.
- `fdr` - False discovery rate corrected p values
```{r eval = FALSE}
gimap_dataset$gi_scores %>%
dplyr::arrange(fdr) %>%
head() %>%
knitr::kable(format = "html")
```
## Plot the results
You can remove any samples from these plots by altering the `reps_to_drop` argument.
```{r eval = FALSE}
plot_exp_v_obs_scatter(gimap_dataset, reps_to_drop = "Day05_RepA_early")
```
```{r eval = FALSE}
plot_rank_scatter(gimap_dataset, reps_to_drop = "Day05_RepA_early")
```
```{r eval = FALSE}
plot_volcano(gimap_dataset, reps_to_drop = "Day05_RepA_early", facet_rep = FALSE)
```
Here's how you can save plots like the above.
```
ggplot2::ggsave("volcano_plot.png")
```
### Plot specific target pair
We can pick out a specific pair to plot.
```{r eval = FALSE}
# "NDEL1_NDE1" is top result so let's plot that
plot_targets_bar(gimap_dataset, target1 = "NDEL1", target2 = "NDE1")
```
## Saving data to a file
We can save all these data as an RDS or the genetic interaction scores themselves to a tsv file.
```
saveRDS(gimap_dataset, "gimap_dataset_final.RDS")
```
```
readr::write_tsv(gimap_dataset$gi_scores, "gi_scores.tsv")
```
## Session Info
This is just for provenance purposes.
```{r eval = FALSE}
sessionInfo()
```