--- title: "Introduction of 'geneNR'" author: "Rajamani Nirmalaruban" date: "2025-02-20" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction of 'geneNR'} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} html_document: df_print: paged --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## R Markdown <span style="font-size: 18px;font-weight: bold;">Introduction:</span> <span style="font-size: 16px;">The geneNR package is designed to streamline the post-GWAS (Genome-Wide Association Studies) and QTL (Quantitative Trait Loci) analysis by automating the identification of candidate genes within a user-defined search window based on the identified SNPs (Single Nucleotide Polymorphisms)or QTLs. This package provides robust support for candidate gene analysis specifically for wheat and rice, making it an invaluable tool for researchers working in the field of genomics.</span> <span style="font-size: 18px;font-weight: bold;">Key Features:</span> <span style="font-size: 16px;">***Automated Search and Retrieval:*** geneNR simplifies the labor-intensive process of manually searching and retrieving candidate genes. By leveraging the package, researchers can save time and focus on interpreting results rather than conducting repetitive search.</span> <span style="font-size: 16px;">***User-Defined Search Window:*** The package allows users to define their own search window parameters, providing flexibility and customization to suit various research needs.</span> <span style="font-size: 16px;">***Support for Wheat and Rice:*** With a focus on two of the most important staple crops, wheat and rice, geneNR ensures targeted and relevant candidate gene analysis, aiding in crop improvement and genetic research efforts.</span> <span style="font-size: 18px;font-weight: bold;">How It Works:</span> <span style="font-size: 16px;">***Importing GWAS Results:*** Users begin by importing their GWAS or QTL results into the geneNR package. The package includes detailed instructions and a sample_data file to guide users through this process.</span> <span style="font-size: 16px;">***Identification of Candidate Genes:*** Once the GWAS or QTL results are imported, geneNR identifies candidate genes within the specified search window based on the SNPs provided. This automated process not only eliminates the need for manual searches but also significantly reduces the time required, ensuring both accuracy and efficiency.</span> <span style="font-size: 16px;">***Exporting Results:*** After the candidate genes are identified, the package exports the results into a ready-to-use output. This format is convenient for further analysis, reporting, and sharing with collaborators.</span> <span style="font-size: 18px;font-weight: bold;">Installation:</span> <span style="font-size: 16px;">To install geneNR package, use the following commands in R:</span> <span style="font-size: 16px;">The geneNR package is currently available in CRAN. Users can install the package directly from CRAN using the following command in R:</span> ```{r geneNR, eval=FALSE} # Install the geneNR package (when published on CRAN) install.packages("geneNR") or # Installation from GitHub (if not available on CRAN) # Uncomment and run the following commands manually: # if (!requireNamespace("devtools", quietly = TRUE)) { # install.packages("devtools") # } # devtools::install_github("Nirmalaruban/geneNR") ``` <span style="font-size: 18px;font-weight: bold;">Note:</span> <span style="font-size: 16px;">As the <b>geneNR</b> package has now been officially published on CRAN, users are kindly requested to duly cite the package in their studies. Proper citation ensures acknowledgment of the efforts behind its development and facilitates further usage by the research community. If users require further assistance or have any inquiries, they are encouraged to contact R. Nirmalaruban (Maintainer) at nirmalaruban97@gmail.com.</span> <span style="font-size: 18px;font-weight: bold;">Data:</span> <span style="font-size: 16px;">The geneNR package includes sample data for both wheat and rice to demonstrate its functionality. Users can provide their GWAS result files in .csv format to mine candidate genes.</span> <span style="font-size: 18px;font-weight: bold;">geneQTL():</span> <span style="font-size: 16px;">Identifies candidate genes based on QTL (Quantitative Trait Loci) analysis results. Users need to provide input data specifying QTLs, their chromosomal positions, and regions of interest. </span> ```{r geneQTL, eval=FALSE} result <- geneQTL("sample_data_wheat_qtl", crop="wheat") result <- geneQTL("sample_data_rice_qtl", crop="rice") ``` <span style="font-size: 16px;">Here is a sample data skeleton for QTL results to be given as input :</span> <span style="font-size: 16px;"> | traits | Chr | start | stop | |------------------------------------| | PH | 7A |93007534 |95007534 | | | | | | </span> <span style="font-size: 18px;font-weight: bold;">Parameters:</span> <span style="font-size: 16px;">The function accepts the following parameters:</span> <span style="font-size: 16px;"> data_file: Input QTL data in .csv format (which needs to contains columns as detailed above in sample_data_wheat_qtl). Crop: Either "wheat" or "rice". Output A .csv file containing candidate genes retrieved for the specified QTL regions.</span> <span style="font-size: 18px;">Output Format Example:</span> <span style="font-size: 16px;"> | traits | QTL | gene_size | gene_id | gene_type | |-------------------------------------------------------| | | | | |protein coding| </span> <span style="font-size: 18px;font-weight: bold;">geneSNP()</span> <span style="font-size: 16px;">Identifies candidate genes based on identified SNPs from GWAS results.</span> ```{r geneSNP, eval=FALSE} result <- geneSNP("sample_data_wheat", 10000, 10000, crop = "wheat") result <- geneSNP("sample_data_rice", 10000, 10000, crop = "rice") ``` <span style="font-size: 16px;">Here is a sample data skeleton for GWAS results:</span> <span style="font-size: 16px;"> | traits | SNP | Chr | Pos | |----------------------------------------| | PH | AX-94490431 | 7A |93007534 | | | | | | </span> <span style="font-size: 18px;font-weight: bold;">Parameters:</span> <span style="font-size: 16px;">The function accepts the following parameters:</span> <span style="font-size: 16px;"> data_file: The input data in .csv format. upstream: The search window upstream of the current position of the SNP. downstream: The search window downstream of the current position of the SNP. Crop: Either "wheat" or "rice". Output A .csv file containing candidate genes retrieved for the specified SNP regions.</span> <span style="font-size: 18px;font-weight: bold;">Note:</span> <span style="font-size: 16px;">Both upstream and downstream can be specified by the user in base pairs. By default, the search window is set to 10^6 bp (1 Mbp).</span> <span style="font-size: 18px;font-weight: bold;">Output:</span> <span style="font-size: 16px;">Upon running the function, the results will be saved in a filtered_gene_id.csv file in the current working directory. This file will contain the following columns: traits, SNP, search_window, gene_size, gene_id, and gene_type. The candidate genes for the respective regions are retrieved from Ensembl Plants (https://plants.ensembl.org/index.html).</span> <span style="font-size: 18px;">Output Format Example:</span> <span style="font-size: 16px;"> | traits | SNP | search_window | gene_size | gene_id | gene_type | |-----------------------------------------------------------------------| | | | | | |protein coding| </span> <span style="font-size: 18px;font-weight: bold;">geneSNPcustom()</span> <span style="font-size: 16px;">Similar to geneSNP, this function provides enhanced customization options for identifying candidate genes (different search window can be provided by users of each SNP).</span> ```{r geneSNPcustom, eval=FALSE} result <- geneSNPcustom("sample_data_wheat_custom", crop = "wheat") ``` <span style="font-size: 18px;font-weight: bold;">import_hmp()</span> <span style="font-size: 16px;"> Imports Hapmap genotypic data files into a format usable by the package. Input: Hapmap file (.hmp.txt). Output: A processed data frame for SNP analysis.</span> <span style="font-size: 18px;font-weight: bold;">import_vcf()</span> <span style="font-size: 16px;"> Imports Variant Call Format (VCF) files. Input: .vcf file. Output: A processed data frame for SNP analysis.</span> <span style="font-size: 18px;font-weight: bold;">plot_SNP()</span> ### Visualizing SNP Distributions with `plot_SNP` <span style="font-size: 16px;">Generates a chromosome map showing SNP distributions. The map includes customization aesthetics.</span> ```{r plot_SNP, eval=TRUE, echo=FALSE, message=FALSE, warning=FALSE, fig.height=5, fig.width=12} library(geneNR) chromosome_details <- read.csv(system.file("extdata", "chromosome_details.csv", package = "geneNR")) data <- read.csv(system.file("extdata", "identified_SNP.csv", package = "geneNR")) chromosome_plot <- plot_SNP(chromosome_details = chromosome_details, data = data, chromosome_color = "steelblue", title = "Chromosome map with SNPs", label_color = "black", image_width = 15, image_height = 10) chromosome_plot ``` <span style="font-size: 18px;font-weight: bold;">summariseSNP()</span> <span style="font-size: 16px;">Calculates and summarizes the distribution of SNPs across chromosomes based on the provided data.</span> ```{r summariseSNP, eval=TRUE, echo=FALSE} demo_SNP <- system.file("extdata", "demo_SNP.hmp.txt", package = "geneNR") data <- import_hmp(demo_SNP) snp_distribution <- summariseSNP(data) snp_distribution ``` <span style="font-size: 18px;font-weight: bold;">summariseSNP_vcf()</span> <span style="font-size: 16px;">Similar to summariseSNP, this function processes data from VCF files.</span> <span style="font-size: 18px;font-weight: bold;">plot_summariseSNP()</span> <span style="font-size: 16px;">Plots a bar chart summarizing SNP distributions across chromosomes.</span> ```{r plot_summariseSNP, eval=TRUE, echo=FALSE, message=FALSE, warning=FALSE, fig.height=5, fig.width=8} demo_SNP <- system.file("extdata", "demo_SNP.hmp.txt", package = "geneNR") data <- import_hmp(demo_SNP) snp_distribution <- summariseSNP(data) plot <- plot_summariseSNP(snp_distribution, bar_color = "skyblue", label_size = 3, label_color = "red") plot ``` <span style="font-size: 16px;"> Parameters bar_color: Color of the bars. label_size: Size of the text labels. label_color: Color of the text labels.</span> <span style="font-size: 18px;font-weight: bold;">Sample Data</span> <span style="font-size: 16px;">Sample Data for Wheat and Rice Several sample datasets for wheat and rice are included in the package. These data demonstrate the functionality of the geneNR package.</span> <span style="font-size: 18px;font-weight: bold;">Note:</span> <span style="font-size: 16px;">For smooth functioning of the package, please ensure that the data files are placed in the working directory.</span> <span style="font-size: 16px;">By using the <b>geneNR</b> package, researchers can significantly enhance the efficiency and accuracy of their post-GWAS analyses, ultimately contributing to the advancement of genomic research and crop improvement.</span> <span style="font-size: 18px;font-weight: bold;">Glossary:</span> <span style="font-size: 16px;"> - **GWAS**: Genome-Wide Association Studies - **SNP**: Single Nucleotide Polymorphism - **API**: Application Programming Interface - **Mbp**: Megabase Pair (1,000,000 base pairs) - **CSV**: Comma-Separated Values - **QTL**: Quantitative Trait Loci </span>