The digitalDLSorteR R package provides a set of tools to deconvolute cell type proportions of bulk RNA-seq data through the development of context-specific deconvolution models based on single-cell RNA-seq (scRNA-seq) data. These models are able to accurately estimate cell type proportions of bulk RNA-seq samples from specific biological environments. For more details about the algorithm and the functionalities implemented in this package, see Torroja and Sanchez-Cabo, 2019, Mañanes et al., 2024, and https://diegommcc.github.io/digitalDLSorteR/.
digitalDLSorteR is available on CRAN and can be installed as follows:
install.packages("digitalDLSorteR")
The version under development is available on GitHub:
if (!requireNamespace("remotes", quietly = TRUE))
install.packages("remotes")
::install_github("diegommcc/digitalDLSorteR") remotes
The package depends on the tensorflow R
package, so a working Python interpreter with the Tensorflow Python
library installed is needed. The installTFpython
function
provides an easy way to install a conda environment called
digitaldlsorter-env
with all necessary dependencies
covered. We recommend installing the TensorFlow Python library in this
way, although a custom installation is possible. See the Keras/TensorFlow
installation and configuration article of the package website for
more details.
library("digitalDLSorteR")
installTFpython(install.conda = TRUE)
The algorithm consists of training Deep Neural Network (DNN) models with simulated bulk RNA-seq samples whose cell composition is known. These pseudo-bulk RNA-seq samples are generated by aggregating pre-characterized scRNA-seq data from specific biological environments. These models are able to accurately deconvolute new bulk RNA-seq samples from the same environment, as they are able to account for possible environmental-dependent transcriptional changes of specific cells, such as immune cells in complex diseases (e.g., specific subtypes of cancer or atherosclerosis). This aspect overcomes this limitation present in other methods. For instance, in the case of immune cells, published methods often rely on purified transcriptional profiles from peripheral blood mononuclear cells despite the fact that these cells are highly variable depending on environmental conditions. Thus, this feature together with the fact that scRNA-seq datasets improve over time (the more cells, the more variability learnt by the models) will lead to build more accurate and comprehensive models.
The package has two main ways of use:
To use pre-trained context specific deconvolution models, digitalDLSorteR relies on the digitalDLSorteRmodels data R package. Therefore, it should be installed along with digitalDLSorteR from GitHub as follows:
::install_github("diegommcc/digitalDLSorteRmodels") remotes
Once digitalDLSorteRmodels is loaded, the pre-trained models are available. See the article Using pre-trained context-specific deconvolution models for an example.
Please, if you use digitalDLSorteR in your research, cite Torroja and Sanchez-Cabo, 2019 (first description of the algorithm) and Mañanes et al., 2024 (version for spatial transcriptomics data whose development has served to improve digitalDLSorteR as well).
Chung, W., Eum, H. H., Lee, H. O., Lee, K. M., Lee, H. B., Kim, K. T., et al. (2017). Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer. Nat. Commun. 8 (1) 15081 doi:10.1038/ncomms15081 |
Lee, HO., Hong, Y., Etlioglu, H.E. et al. (2020). Lineage-dependent gene expression programs influence the immune landscape of colorectal cancer. Nat. Genet. 52 594-603 doi:10.1038/s41588-020-0636-z |
Torroja, C. and Sánchez-Cabo, F. (2019). digitalDLSorter: A Deep Learning algorithm to quantify immune cell populations based on scRNA-seq data. Frontiers in Genetics 10 978 doi:10.3389/fgene.2019.00978 |
Mañanes, D., Rivero-García, I., Relaño, C., Jimenez-Carretero, D., Torres, M., Sancho, D., Torroja, C. and Sánchez-Cabo, F. (2024). SpatialDDLS: An R package to deconvolute spatial transcriptomics data using neural networks. Bioinformatics 40 2 doi:10.1093/bioinformatics/btae072 |