--- title: "Obtaining PIN and Gene Sets Data" output: rmarkdown::html_vignette date: "`r Sys.Date()`" vignette: > %\VignetteIndexEntry{Obtaining PIN and Gene Sets Data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` # Get PIN File For retrieving the PIN file for an organism of your choice, you may use the function `get_pin_file()`. As of this version, the only source for PIN data is "BioGRID". By default, the function downloads the PIN data from BioGRID and processes it, saves it in a temporary file and returns the path: ```{r} ## the default organism is "Homo_sapiens" path_to_pin_file <- get_pin_file() ``` You can retrieve the PIN data for the organism of your choice, by setting the `org` argument: ```{r} ## retrieving PIN data for "Gallus_gallus" path_to_pin_file <- get_pin_file(org = "Gallus_gallus") ``` You may also supply a `path/to/PIN/file` to save the PIN file for later use (in this case, the path you supply will be returned): ```{r} ## saving the "Homo_sapiens" PIN as "/path/to/PIN/file" path_to_pin_file <- get_pin_file(path2pin = "/path/to/PIN/file") ``` You may also retrieve a specific version of BioGRID via setting the `release` argument: ```{r} ## retrieving PIN data for "Mus_musculus" from BioGRID release 3.5.179 path_to_pin_file <- get_pin_file( org = "Mus_musculus", release = "3.5.179" ) ``` # Get Gene Sets List To retrieve organism-specific gene sets list, you may use the function `get_gene_sets_list()`. The available sources for gene sets are "KEGG", "Reactome" and "MSigDB". The function retrieves the gene sets data from the source and processes it into a list of two objects used by pathfindR for active-subnetwork-oriented enrichment analysis: 1. **gene_sets** A list containing the genes involved in each gene set 2. **descriptions** A named vector containing the descriptions for each gene set By default, `get_gene_sets_list()` obtains "KEGG" gene sets for "hsa". ## KEGG Pathway Gene Sets To obtain the gene sets list of the KEGG pathways for an organism of your choice, use the KEGG organism code for the selected organism. For a full list of all available organisms, see [here](https://www.genome.jp/kegg/catalog/org_list.html). ```{r} ## obtaining KEGG pathway gene sets for Rattus norvegicus (rno) gsets_list <- get_gene_sets_list(org_code = "rno") ``` ## Reactome Pathway Gene Sets For obtaining Reactome pathway gene sets, set the `source` argument to "Reactome". This downloads the most current Reactome pathways in gmt format and processes it into the list object that pathfindR uses: ```{r} gsets_list <- get_gene_sets_list(source = "Reactome") ``` For Reactome, there is only one collection of pathway gene sets. ## MSigDB Gene Sets Using `msigdbr`, `pathfindR` can retrieve all MSigDB gene sets. For this, set the `source` argument to "MSigDB" and the `collection` argument to the desired MSigDB collection (one of H, C1, C2, C3, C4, C5, C6, C7): ```{r} gsets_list <- get_gene_sets_list( source = "MSigDB", collection = "C2" ) ``` The default organism for MSigDB is "Homo sapiens", you may obtain the gene sets data for another organism by setting the `species` argument: ```{r} ## obtaining C5 gene sets data for "Drosophila melanogaster" gsets_list <- get_gene_sets_list( source = "MSigDB", species = "Drosophila melanogaster", collection = "C5" ) ``` ```{r, eval=TRUE} ## see msigdbr::msigdbr_species() for all available organisms msigdbr::msigdbr_species() ``` You may also obtain the gene sets for a subcollection by setting the `subcollection` argument: ```{r} ## obtaining C3 - MIR: microRNA targets gsets_list <- get_gene_sets_list( source = "MSigDB", collection = "C3", subcollection = "MIR" ) ```