--- title: "TaxNorm Introduction" output: rmarkdown::html_vignette: fig_caption: yes vignette: > %\VignetteIndexEntry{TaxNorm Introduction} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction This document introduces the `TaxNorm` R package, a package for normalizing microbiome taxa data. Here, we will go through how to install, analyze and visualize microbiome data using this package. `TaxNorm` implements the Zero Inflated Negative Binomial (ZINB) method to normalize microbiome data. ## What is the ZINB method? ## Outline There are three main steps in using this package: - **Load and QC Input Data**: In the package we have an example data set from the phyloeq package that shows shows the format needed for analysis. These data can be generated using methods blah blah blah. - **Running ZINB Normalization Function**: The `TaxNorm_Normalization` function is runn using the above data on the input. This function implements the ZINB method for normalization. - **Visualizing and Quality Control**: Last, visualization and quality control measures are built into the package for use. ## Installation ### Required Packages `TaxaNorm` requires the packages `phyloeq` and `microbiome` which can be found on bioconductor. ### Installation from Bioconductor ### Installation from Github For the newest, but potentially unstable, version of the package, direct github installation is also supported. ```{r, eval = FALSE} remotes::install_github("wangziyue57/TaxNorm") ``` ### Loading Package into R Environment ```{r, eval = FALSE} library(TaxaNorm) # library(phyloseq) # library(microbiome) # library(ggplot2) # library(vegan) # library(MASS) ``` ## Example Usage Basic Useage ```{r, eval = FALSE} data("TaxaNorm_Example_Input", package = "TaxaNorm") # run normalization TaxaNorm_Example_Output <- TaxaNorm_Normalization(data= TaxaNorm_Example_Input, depth = NULL, group = sample_data(TaxaNorm_Example_Input)$body_site, meta.data = NULL, filter.cell.num = 10, filter.taxa.count = 0, random = FALSE, ncores = 1) # run diagnosis test Diagnose_Data <- TaxaNorm_Run_Diagnose(Normalized_Results = TaxaNorm_Example_Output, prev = TRUE, equiv = TRUE, group = sample_data(TaxaNorm_Example_Input)$body_site) ``` ### Load Input Data Built in example data as a phyloseq object can be loaded with the command below. ```{r, eval = FALSE} data("TaxaNorm_Example_Input", package = "TaxaNorm") ``` ### Pre-process Input Data We have prepared several QC figures for the input data characters, which give a preliminary criteria on pre-filtering rare taxa with low information before any analysis. This will improve the power and computational efficiency for the algorithm. If the user already has the cleaned data or pre-processed the data by themselves before, they can ignore and skip this step. ```{r, eval = FALSE} qc_data <- TaxaNorm_QC_Input(TaxaNorm_Example_Input) ``` Here we provide a popular option to ensure at least `filter.sample.num` samples with a count of `filter.taxa.count` or more, where `filter.sample.num` can be chosen as any arbitrary value or the sample size of the smallest group of samples. By default, we used `filter.taxa.count=1` and `filter.sample.num=10`. This criteria is incorporated in the following main function `TaxNorm_Normalization()` as well. ```{r, eval = FALSE} filter.sample.num =1 filter.taxa.count = 10 taxaIn <- rowSums(abundances(TaxaNorm_Example_Input) > filter.taxa.count) > filter.sample.num TaxaNorm_Example_Input <- prune_taxa(taxaIn, TaxaNorm_Example_Input) ``` Users can apply any of their customized filtering criteria as well. Alternatively, a basic pre-filtering is to keep only rows that have at least 10 reads total: ```{r, eval = FALSE} taxaIn <- rowSums(abundances(TaxaNorm_Example_Input)) > 10 TaxaNorm_Example_Input <- prune_taxa(taxaIn, TaxaNorm_Example_Input) ``` ### QC Input Data ```{r, eval = FALSE} qc_data <- TaxNorm_QC_Input(TaxaNorm_Example_Input) ``` ### Run Normalization The normalization is run and returns a `TaxaNorm_Results` object. This object contains the input data, raw data, normdata, ecdf, model parameters, and convergence. ```{r, eval = FALSE} #Pick group from phyloseq object group <- sample_data(TaxaNorm_Example_Input)$body_site #Run Normalization function Normalized_Data <- TaxaNorm_Normalization(data = TaxaNorm_Example_Input, depth = NULL, group = group, filter.taxa.count = 0, random = TRUE, ncores = 1) ``` ### QC TaxNorm Model ```{r, eval = FALSE} data("TaxaNorm_Example_Output", package = "TaxaNorm") TaxaNorm_Model_QC(TaxaNormResults = TaxaNorm_Example_Output) ``` ### TaxNorm NMDS ```{r, eval = FALSE} TaxaNorm_NMDS(TaxaNormResults = TaxaNorm_Example_Output, group_column = "body_site") ```