[Bioc-devel] Bioconductor 3.5 is released
Valerie.Obenchain at RoswellPark.org
Wed Apr 26 00:39:13 CEST 2017
April 25, 2017
We are pleased to announce Bioconductor 3.5, consisting of 1384
software packages, 316 experiment data packages, and 911 annotation
There are 88 new software packages, and many updates and improvements
to existing packages; Bioconductor 3.5 is compatible with R 3.4,
and is supported on Linux, 32- and 64-bit Windows, and Mac OS X. This
release will include an updated Bioconductor [Amazon Machine Image]
and [Docker containers].
for details and downloads.
* [Getting Started with Bioconductor
* [New Software Packages](#new-software-packages)
* [NEWS from new and existing
* [Deprecated and Defunct Packages](#deprecated-and-defunct-packages)
Getting Started with Bioconductor 3.5
To update to or install Bioconductor 3.5:
1. Install R 3.4. Bioconductor 3.5 has been designed expressly for
this version of R.
2. Follow the instructions at
New Software Packages
There are 88 new software packages in this release of Bioconductor.
This package provides class and other infrastructure to implement
filters for manipulating Bioconductor annotation resources. The
filters will be used by ensembldb, Organism.dplyr, and other
- [ATACseqQC](https://bioconductor.org/packages/ATACseqQC) ATAC-seq,
an assay for Transposase-Accessible Chromatin using sequencing, is
a rapid and sensitive method for chromatin accessibility analysis.
It was developed as an alternative method to MNase-seq, FAIRE-seq
and DNAse-seq. Comparing to the other methods, ATAC-seq requires
less amount of the biological samples and time to process. In the
process of analyzing several ATAC-seq dataset produced in our labs,
we learned some of the unique aspects of the quality assessment for
ATAC-seq data.To help users to quickly assess whether their
ATAC-seq experiment is successful, we developed ATACseqQC package
partially following the guideline published in Nature Method 2013
(Greenleaf et al.), including diagnostic plot of fragment size
distribution, proportion of mitochondria reads, nucleosome
positioning pattern, and CTCF or other Transcript Factor
- [banocc](https://bioconductor.org/packages/banocc) BAnOCC is a
package designed for compositional data, where each sample sums to
one. It infers the approximate covariance of the unconstrained data
using a Bayesian model coded with `rstan`. It provides as output
the `stanfit` object as well as posterior median and credible
interval estimates for each correlation element.
- [basecallQC](https://bioconductor.org/packages/basecallQC) The
basecallQC package provides tools to work with Illumina bcl2Fastq
(versions >= 2.1.7) software.Prior to basecalling and
demultiplexing using the bcl2Fastq software, basecallQC functions
allow the user to update Illumina sample sheets from versions <=
1.8.9 to >= 2.1.7 standards, clean sample sheets of common problems
such as invalid sample names and IDs, create read and index
basemasks and the bcl2Fastq command. Following the generation of
basecalled and demultiplexed data, the basecallQC packages allows
the user to generate HTML tables, plots and a self contained report
of summary metrics from Illumina XML output files.
This package creates a persistent on-disk cache of files that the
user can add, update, and retrieve. It is useful for managing
resources (such as custom Txdb objects) that are costly or
difficult to create, web resources, and data files used across
- [BioCor](https://bioconductor.org/packages/BioCor) Calculates
functional similarities based on the pathways described on KEGG and
REACTOME or in gene sets. These similarities can be calculated for
pathways or gene sets, genes, or clusters and combined with other
similarities. They can be used to improve networks, gene selection,
- [BioMedR](https://bioconductor.org/packages/BioMedR) The BioMedR
package offers an R/Bioconductor package generating various
molecular representations for chemicals, proteins, DNAs/RNAs and
- [biotmle](https://bioconductor.org/packages/biotmle) This package
facilitates the discovery of biomarkers from biological sequencing
data (e.g., microarrays, RNA-seq) based on the associations of
potential biomarkers with exposure and outcome variables by
implementing an estimation procedure that combines a generalization
of the moderated t-statistic with asymptotically linear statistical
parameters estimated via targeted minimum loss-based estimation
- [BLMA](https://bioconductor.org/packages/BLMA) Suit of tools for
bi-level meta-analysis. The package can be used in a wide range of
applications, including general hypothesis testings, differential
expression analysis, functional analysis, and pathway analysis.
- [BPRMeth](https://bioconductor.org/packages/BPRMeth) BPRMeth
package uses the Binomial Probit Regression likelihood to model
methylation profiles and extract higher order features. These
features quantitate precisely notions of shape of a methylation
profile. Using these higher order features across promoter-proximal
regions, we construct a powerful predictor of gene expression.
Also, these features are used to cluster proximal-promoter regions
using the EM algorithm.
Predicts branchpoint probability for sites in intronic branchpoint
windows. Queries can be supplied as intronic regions; or to
evaluate the effects of mutations, SNPs.
- [BUMHMM](https://bioconductor.org/packages/BUMHMM) This is a
probabilistic modelling pipeline for computing per- nucleotide
posterior probabilities of modification from the data collected in
structure probing experiments. The model supports multiple
experimental replicates and empirically corrects coverage- and
sequence-dependent biases. The model utilises the measure of a
"drop-off rate" for each nucleotide, which is compared between
replicates through a log-ratio (LDR). The LDRs between control
replicates define a null distribution of variability in drop-off
rate observed by chance and LDRs between treatment and control
replicates gets compared to this distribution. Resulting empirical
p-values (probability of being "drawn" from the null distribution)
are used as observations in a Hidden Markov Model with a
Beta-Uniform Mixture model used as an emission model. The resulting
posterior probabilities indicate the probability of a nucleotide of
having being modified in a structure probing experiment.
- [CATALYST](https://bioconductor.org/packages/CATALYST) Mass
cytometry (CyTOF) uses heavy metal isotopes rather than fluorescent
tags as reporters to label antibodies, thereby substantially
decreasing spectral overlap and allowing for examination of over 50
parameters at the single cell level. While spectral overlap is
significantly less pronounced in CyTOF than flow cytometry,
spillover due to detection sensitivity, isotopic impurities, and
oxide formation can impede data interpretability. We designed
CATALYST (Cytometry dATa anALYSis Tools) to provide a pipeline for
preprocessing of cytometry data, including i) normalization using
bead standards, ii) single-cell deconvolution, and iii) bead-based
- [cellbaseR](https://bioconductor.org/packages/cellbaseR) This R
package makes use of the exhaustive RESTful Web service API that
has been implemented for the Cellabase database. It enable
researchers to query and obtain a wealth of biological information
from a single database saving a lot of time. Another benefit is
that researchers can easily make queries about different biological
topics and link all this information together as all information is
- [cellscape](https://bioconductor.org/packages/cellscape) CellScape
facilitates interactive browsing of single cell clonal evolution
datasets. The tool requires two main inputs: (i) the genomic
content of each single cell in the form of either copy number
segments or targeted mutation values, and (ii) a single cell
phylogeny. Phylogenetic formats can vary from dendrogram-like
phylogenies with leaf nodes to evolutionary model-derived
phylogenies with observed or latent internal nodes. The CellScape
phylogeny is flexibly input as a table of source-target edges to
support arbitrary representations, where each node may or may not
have associated genomic data. The output of CellScape is an
interactive interface displaying a single cell phylogeny and a
cell-by-locus genomic heatmap representing the mutation status in
each cell for each locus.
chimeraviz manages data from fusion gene finders and provides
useful visualization tools.
Package with a quality control pipeline for ChIP-exo/nexus data.
Identification of clusters of co-expressed genes based on their
expression across multiple (replicated) biological samples.
- [coseq](https://bioconductor.org/packages/coseq) Co-expression
analysis for expression profiles arising from high-throughput
sequencing data. Feature (e.g., gene) profiles are clustered using
adapted transformations and mixture models or a K-means algorithm,
and model selection criteria (to choose an appropriate number of
clusters) are provided.
- [cydar](https://bioconductor.org/packages/cydar) Identifies
differentially abundant populations between samples and groups in
mass cytometry data. Provides methods for counting cells into
hyperspheres, controlling the spatial false discovery rate, and
visualizing changes in abundance in the high-dimensional marker
- [DaMiRseq](https://bioconductor.org/packages/DaMiRseq) The DaMiRseq
package offers a tidy pipeline of data mining procedures to
identify transcriptional biomarkers and exploit them for
classification purposes.. The package accepts any kind of data
presented as a table of raw counts and allows including covariates
that occur with the experimental setting. A series of functions
enable the user to clean up the data by filtering genomic features
and samples, to adjust data by identifying and removing the
unwanted source of variation (i.e. batches and confounding factors)
and to select the best predictors for modeling. Finally, a
``Stacking'' ensemble learning technique is applied to build a
robust classification model. Every step includes a checkpoint that
the user may exploit to assess the effects of data management by
looking at diagnostic plots, such as clustering and heatmaps, RLE
boxplots, MDS or correlation plot.
Wrapping an array-like object (typically an on-disk object) in a
DelayedArray object allows one to perform common array operations
on it without loading the object in memory. In order to reduce
memory usage and optimize performance, operations on the object are
either delayed or executed using a block processing mechanism. Note
that this also works on in-memory array-like objects like DataFrame
objects (typically with Rle columns), Matrix objects, and ordinary
arrays and data frames.
Discordant is a method to determine differential correlation of
molecular feature pairs from -omics data using mixture models.
Algorithm is explained further in Siska et al.
- [DMRScan](https://bioconductor.org/packages/DMRScan) This package
detects significant differentially methylated regions (for both
qualitative and quantitative traits), using a scan statistic with
underlying Poisson heuristics. The scan statistic will depend on a
sequence of window sizes (# of CpGs within each window) and on a
threshold for each window size. This threshold can be calculated by
three different means: i) analytically using Siegmund et.al (2012)
solution (preferred), ii) an important sampling as suggested by
Zhang (2008), and a iii) full MCMC modeling of the data, choosing
between a number of different options for modeling the dependency
between each CpG.
- [epiNEM](https://bioconductor.org/packages/epiNEM) epiNEM is an
extension of the original Nested Effects Models (NEM). EpiNEM is
able to take into account double knockouts and infer more complex
network signalling pathways.
EventPointer is an R package to identify alternative splicing
events that involve either simple (case-control experiment) or
complex experimental designs such as time course experiments and
studies including paired-samples. The algorithm can be used to
analyze data from either junction arrays (Affymetrix Arrays) or
sequencing data (RNA-Seq). The software returns a data.frame with
the detected alternative splicing events: gene name, type of event
(cassette, alternative 3',...,etc), genomic position, statistical
significance and increment of the percent spliced in (Delta PSI)
for all the events. The algorithm can generate a series of files to
visualize the detected alternative splicing events in IGV. This
eases the interpretation of results and the design of primers for
standard PCR validation.
- [flowTime](https://bioconductor.org/packages/flowTime) This package
was developed for analysis of both dynamic and steady state
experiments examining the function of gene regulatory networks in
yeast (strain W303) expressing fluorescent reporter proteins using
a BD Accuri C6 and SORP cytometers. However, the functions are for
the most part general and may be adapted for analysis of other
organisms using other flow cytometers. Functions in this package
facilitate the annotation of flow cytometry data with experimental
metadata, as is requisite for dissemination and general
ease-of-use. Functions for creating, saving and loading gate sets
are also included. In the past, we have typically generated summary
statistics for each flowset for each timepoint and then annotated
and analyzed these summary statistics. This method loses a great
deal of the power that comes from the large amounts of individual
cell data generated in flow cytometry, by essentially collapsing
this data into a bulk measurement after subsetting. In addition to
these summary functions, this package also contains functions to
facilitate annotation and analysis of steady-state or time-lapse
data utilizing all of the data collected from the thousands of
individual cells in each sample.
- [funtooNorm](https://bioconductor.org/packages/funtooNorm) Provides
a function to normalize Illumina Infinium Human Methylation 450
BeadChip (Illumina 450K), correcting for tissue and/or cell type.
GA4GHclient provides an easy way to access public data servers
through Global Alliance for Genomics and Health (GA4GH) genomics
API. It provides low-level access to GA4GH API and translates
response data into Bioconductor-based class objects.
- [gcapc](https://bioconductor.org/packages/gcapc) Peak calling for
ChIP-seq data with consideration of potential GC bias in sequencing
reads. GC bias is first estimated with generalized linear mixture
models using weighted GC strategy, then applied into peak
This packages aims for easy accessible application of classifiers
which have been published in literature using an ExpressionSet as
Programmatically access the NIH / NCI Genomic Data Commons RESTful
Provide infrastructure to store and access genomewide
position-specific scores within R and Bioconductor.
- [GISPA](https://bioconductor.org/packages/GISPA) GISPA is a method
intended for the researchers who are interested in defining gene
sets with similar, a priori specified molecular profile. GISPA
method has been previously published in Nucleic Acid Research
(Kowalski et al., 2016; PMID: 26826710).
- [goSTAG](https://bioconductor.org/packages/goSTAG) Gene lists
derived from the results of genomic analyses are rich in biological
information. For instance, differentially expressed genes (DEGs)
from a microarray or RNA-Seq analysis are related functionally in
terms of their response to a treatment or condition. Gene lists can
vary in size, up to several thousand genes, depending on the
robustness of the perturbations or how widely different the
conditions are biologically. Having a way to associate biological
relatedness between hundreds and thousands of genes systematically
is impractical by manually curating the annotation and function of
each gene. Over-representation analysis (ORA) of genes was
developed to identify biological themes. Given a Gene Ontology (GO)
and an annotation of genes that indicate the categories each one
fits into, significance of the over-representation of the genes
within the ontological categories is determined by a Fisher's exact
test or modeling according to a hypergeometric distribution.
Comparing a small number of enriched biological categories for a
few samples is manageable using Venn diagrams or other means for
assessing overlaps. However, with hundreds of enriched categories
and many samples, the comparisons are laborious. Furthermore, if
there are enriched categories that are shared between samples,
trying to represent a common theme across them is highly
subjective. goSTAG uses GO subtrees to tag and annotate genes
within a set. goSTAG visualizes the similarities between the
over-representation of DEGs by clustering the p-values from the
enrichment statistical tests and labels clusters with the GO term
that has the most paths to the root within the subtree generated
from all the GO terms in the cluster.
- [GRridge](https://bioconductor.org/packages/GRridge) This package
allows the use of multiple sources of co-data (e.g. external
p-values, gene lists, annotation) to improve prediction of binary,
continuous and survival response using (logistic, linear or Cox)
group-regularized ridge regression. It also facilitates post-hoc
variable selection and prediction diagnostics by cross-validation
using ROC curves and AUC.
- [heatmaps](https://bioconductor.org/packages/heatmaps) This package
provides functions for plotting heatmaps of genome-wide data across
genomic intervals, such as ChIP-seq signals at peaks or across
promoters. Many functions are also provided for investigating
- [hicrep](https://bioconductor.org/packages/hicrep) Hi-C is a
powerful technology for studying genome-wide chromatin
interactions. However, current methods for assessing Hi-C data
reproducibility can produce misleading results because they ignore
spatial features in Hi-C data, such as domain structure and
distance-dependence. We present a novel reproducibility measure
that systematically takes these features into consideration. This
measure can assess pairwise differences between Hi-C matrices under
a wide range of settings, and can be used to determine optimal
sequencing depth. Compared to existing approaches, it consistently
shows higher accuracy in distinguishing subtle differences in
reproducibility and depicting interrelationships of cell lineages
than existing approaches. This R package `hicrep` implements our
- [ideal](https://bioconductor.org/packages/ideal) This package
provides functions for an Interactive Differential Expression
AnaLysis of RNA-sequencing datasets, to extract quickly and
effectively information downstream the step of differential
expression. A Shiny application encapsulates the whole package.
- [IMAS](https://bioconductor.org/packages/IMAS) Integrative analysis
of Multi-omics data for Alternative splicing.
ImpulseDE2 is a differential expression algorithm for longitudinal
count data sets which arise in sequencing experiments such as
RNA-seq, ChIP-seq, ATAC-seq and DNaseI-seq. ImpulseDE2 is based on
a negative binomial noise model with dispersion trend smoothing by
DESeq2 and uses the impulse model to constrain the mean expression
trajectory of each gene. The impulse model was empirically found to
fit global expression changes in cells after environmental and
developmental stimuli and is therefore appropriate in most cell
biological scenarios. The constraint on the mean expression
trajectory prevents overfitting to small expression fluctuations.
Secondly, ImpulseDE2 has higher statistical testing power than
generalized linear model-based differential expression algorithms
which fit time as a categorial variable if more than six time
points are sampled because of the fixed number of parameters.
- [IntEREst](https://bioconductor.org/packages/IntEREst) This package
performs Intron-Exon Retention analysis on RNA-seq data (.bam
Implementation of the Interval-Wise Testing (IWT) for omics data.
This inferential procedure tests for differences in "Omics" data
between two groups of genomic regions (or between a group of
genomic regions and a reference center of symmetry), and does not
require fixing location and scale at the outset.
karyoploteR creates karyotype plots of arbitrary genomes and offers
a complete set of functions to plot arbitrary data on them. It
mimicks many R base graphics functions coupling them with a
coordinate change function automatically mapping the chromosome and
data coordinates into the plot coordinates. In addition to the
provided data plotting functions, it is easy to add new ones.
- [Logolas](https://bioconductor.org/packages/Logolas) Produces logo
plots of a variety of symbols and names comprising English
alphabets, numerics and punctuations. Can be used for sequence
motif generation, mutation pattern generation, protein amino acid
geenration and symbol strength representation in any generic
- [mapscape](https://bioconductor.org/packages/mapscape) MapScape
integrates clonal prevalence, clonal hierarchy, anatomic and
mutational information to provide interactive visualization of
spatial clonal evolution. There are four inputs to MapScape: (i)
the clonal phylogeny, (ii) clonal prevalences, (iii) an image
reference, which may be a medical image or drawing and (iv) pixel
locations for each sample on the referenced image. Optionally,
MapScape can accept a data table of mutations for each clone and
their variant allele frequencies in each sample. The output of
MapScape consists of a cropped anatomical image surrounded by two
representations of each tumour sample. The first, a cellular
aggregate, visually displays the prevalence of each clone. The
second shows a skeleton of the clonal phylogeny while highlighting
only those clones present in the sample. Together, these
representations enable the analyst to visualize the distribution of
clones throughout anatomic space.
A problem when recording 3D fluorescent microscopy images is how to
properly present these results in 2D. Maximum intensity projections
are a popular method to determine the focal plane of each pixel in
the image. The problem with this approach, however, is that
out-of-focus elements will still be visible, making edges and fine
structures difficult to detect. This package aims to resolve this
problem by using the contrast around a given pixel to determine the
focal plane, allowing for a much cleaner structure detection than
would be otherwise possible. For convenience, this package also
contains functions to perform various other types of projections,
including a maximum intensity projection.
- [MCbiclust](https://bioconductor.org/packages/MCbiclust) Custom
made algorithm and associated methods for finding, visualising and
analysing biclusters in large gene expression data sets. Algorithm
is based on with a supplied gene set of size n, finding the maximum
strength correlation matrix containing m samples from the data set.
- [metavizr](https://bioconductor.org/packages/metavizr) This package
provides Websocket communication to the metaviz web app
(http://metaviz.cbcb.umd.edu) for interactive visualization of
metagenomics data. Objects in R/bioc interactive sessions can be
displayed in plots and data can be explored using a facetzoom
visualization. Fundamental Bioconductor data structures are
supported (e.g., MRexperiment objects), while providing an easy
mechanism to support other data structures. Visualizations (using
d3.js) can be easily added to the web app as well.
Permutation analysis, based on Monte Carlo sampling, for testing
the hypothesis that the number of conserved differentially
methylated elements, between several generations, is associated to
an effect inherited from a treatment and that stochastic effect can
- [MIGSA](https://bioconductor.org/packages/MIGSA) Massive and
Integrative Gene Set Analysis. The MIGSA package allows to perform
a massive and integrative gene set analysis over several expression
and gene sets simultaneously. It provides a common gene expression
analytic framework that grants a comprehensive and coherent
analysis. Only a minimal user parameter setting is required to
perform both singular and gene set enrichment analyses in an
integrative manner by means of the best available methods, i.e.
dEnricher and mGSZrespectively. The greatest strengths of this big
omics data tool are the availability of several functions to
explore, analyze and visualize its results in order to facilitate
the data mining task over huge information sources. MIGSA package
also provides several functions that allow to easily load the most
updated gene sets from several repositories.
- [mimager](https://bioconductor.org/packages/mimager) Easily
visualize and inspect microarrays for spatial artifacts.
'motifcounter' provides functionality to compute the statistics
related with motif matching and counting of motif matches in DNA
sequences. As an input, 'motifcounter' requires a motif in terms of
a position frequency matrix (PFM). Furthermore, a set of DNA
sequences is required to estimated a higher-order background model
(BGM). The package provides functions to investigate the the
per-position and per strand log-likelihood scores between the PFM
and the BGM across a given sequence of set of sequences.
Furthermore, the package facilitates motif matching based on an
automatically derived score threshold. To this end the distribution
of scores is efficiently determined and the score threshold is
chosen for a user-prescribed significance level. This allows to
control for the false positive rate. Moreover, 'motifcounter'
implements a motif match enrichment test based on two the number of
motif matches that are expected in random DNA sequences. Motif
enrichment is facilitated by either a compound Poisson
approximation or a combinatorial approximation of the motif match
counts. Both models take higher-order background models, the
motif's self-similarity, and hits on both DNA strands into account.
The package is in particular useful for long motifs and/or relaxed
choices of score thresholds, because the implemented algorithms
efficiently bypass the need for enumerating a (potentially huge)
set of DNA words that can give rise to a motif match.
- [msgbsR](https://bioconductor.org/packages/msgbsR) Pipeline for the
anaysis of a MS-GBS experiment.
Calculate the spearman correlation between the source omics data
and other target omics data, identify the significant correlations
and plot the significant correlations on the heat map in which the
x-axis and y-axis are ordered by the chromosomal location.
- [MWASTools](https://bioconductor.org/packages/MWASTools) MWAS
provides a complete pipeline to perform metabolome-wide association
studies. Key functionalities of the package include: quality
control analysis of metabonomic data; MWAS using different
association models (partial correlations; generalized linear
models); model validation using non-parametric bootstrapping;
visualization of MWAS results; NMR metabolite identification using
- [NADfinder](https://bioconductor.org/packages/NADfinder) Call peaks
for two samples: target and control. It will count the reads for
tiles of the genome and then convert it to ratios. The ratios will
be corrected and smoothed. The z-scores is calculated for each
counting windows over the background. The peaks will be detected
based on z-scores.
- [netReg](https://bioconductor.org/packages/netReg) netReg fits
linear regression models using network-penalization. Graph prior
knowledge, in the form of biological networks, is being
incorporated into the likelihood of the linear model. The networks
describe biological relationships such as co-regulation or
dependency of the same transcription factors/metabolites/etc.
yielding a part sparse and part smooth solution for coefficient
This package provides an alternative interface to Bioconductor
'annotation' resources, in particular the gene identifier mapping
functionality of the 'org' packages (e.g., org.Hs.eg.db) and the
genome coordinate functionality of the 'TxDb' packages (e.g.,
- [pathprint](https://bioconductor.org/packages/pathprint) Algorithms
to convert a gene expression array provided as an expression table
or a GEO reference to a 'pathway fingerprint', a vector of discrete
ternary scores representing high (1), low(-1) or insignificant (0)
expression in a suite of pathways.
- [pgca](https://bioconductor.org/packages/pgca) Protein Group Code
Algorithm (PGCA) is a computationally inexpensive algorithm to
merge protein summaries from multiple experimental quantitative
proteomics data. The algorithm connects two or more groups with
overlapping accession numbers. In some cases, pairwise groups are
mutually exclusive but they may still be connected by another group
(or set of groups) with overlapping accession numbers. Thus, groups
created by PGCA from multiple experimental runs (i.e., global
groups) are called "connected" groups. These identified global
protein groups enable the analysis of quantitative data available
for protein groups instead of unique protein identifiers.
It uses the overlap between enriched and non-enriched datasets to
compensate for the bias introduced in global phosphorylation after
applying median normalization.
- [POST](https://bioconductor.org/packages/POST) Perform orthogonal
projection of high dimensional data of a set, and statistical
modeling of phenotye with projected vectors as predictor.
- [PPInfer](https://bioconductor.org/packages/PPInfer) Interactions
between proteins occur in many, if not most, biological processes.
Most proteins perform their functions in networks associated with
other proteins and other biomolecules. This fact has motivated the
development of a variety of experimental methods for the
identification of protein interactions. This variety has in turn
urshered in the development of numerous different computational
approaches for modeling and predicting protein interactions.
Sometimes an experiment is aimed at identifying proteins closely
related to some interesting proteins. A network based statistical
learning method is used to infer the putative functions of proteins
from the known functions of its neighboring proteins on a PPI
network. This package identifies such proteins often involved in
the same or similar biological functions.
This package provides a flexible representation of copy number,
mutation, and other data that fit into the ragged array schema for
genomic location data. The basic representation of such data
provides a rectangular flat table interface to the user with range
information in the rows and samples/specimen in the columns.
- [ramwas](https://bioconductor.org/packages/ramwas) RaMWAS provides
a complete toolset for methylome-wide association studies (MWAS).
It is specifically designed for data from enrichment based
methylation assays, but can be applied to other data as well. The
analysis pipeline includes seven steps: (1) scanning aligned reads
from BAM files, (2) calculation of quality control measures, (3)
creation of methylation score (coverage) matrix, (4) principal
component analysis for capturing batch effects and detection of
outliers, (5) association analysis with respect to phenotypes of
interest while correcting for top PCs and known covariates, (6)
annotation of significant findings, and (7) multi-marker analysis
(methylation risk score) using elastic net. Additionally, RaMWAS
include tools for joint analysis of methlyation and genotype data.
- [REMP](https://bioconductor.org/packages/REMP) Machine
learing-based tools to predict DNA methylation of locus-specific
repetitive elements (RE) by learning surrounding genetic and
epigenetic information. These tools provide genomewide and
single-base resolution of DNA methylation prediction on RE that are
difficult to measure using array-based or sequencing-based
platforms, which enables epigenome-wide association study (EWAS)
and differentially methylated region (DMR) analysis on RE.
- [RITAN](https://bioconductor.org/packages/RITAN) Tools for
comprehensive gene set enrichment and extraction of multi-resource
high confidence subnetworks.
- [RIVER](https://bioconductor.org/packages/RIVER) An implementation
of a probabilistic modeling framework that jointly analyzes
personal genome and transcriptome data to estimate the probability
that a variant has regulatory impact in that individual. It is
based on a generative model that assumes that genomic annotations,
such as the location of a variant with respect to regulatory
elements, determine the prior probability that variant is a
functional regulatory variant, which is an unobserved variable. The
functional regulatory variant status then influences whether nearby
genes are likely to display outlier levels of gene expression in
that person. See the RIVER website for more information,
documentation and examples.
This package does nucleosome positioning using informative
Multinomial-Dirichlet prior in a t-mixture with reversible jump
estimation of nucleosome positions for genome-wide profiling.
A workflow package for RNA-Seq experiments
- [rqt](https://bioconductor.org/packages/rqt) Despite the recent
advances of modern GWAS methods, it still remains an important
problem of addressing calculation an effect size and corresponding
p-value for the whole gene rather than for single variant. The R-
package rqt offers gene-level GWAS meta-analysis. For more
information, see: "Gene-set association tests for next-generation
sequencing data" by Lee et al (2016), Bioinformatics, 32(17),
- [RTNduals](https://bioconductor.org/packages/RTNduals) RTNduals is
a tool that searches for possible co-regulatory loops between
regulon pairs generated by the RTN package. It compares the shared
targets in order to infer 'dual regulons', a new concept that tests
whether regulon pairs agree on the predicted downstream effects.
- [samExploreR](https://bioconductor.org/packages/samExploreR) This R
package is designed for subsampling procedure to simulate
sequencing experiments with reduced sequencing depth. This package
can be used to anlayze data generated from all major sequencing
platforms such as Illumina GA, HiSeq, MiSeq, Roche GS-FLX, ABI
SOLiD and LifeTech Ion PGM Proton sequencers. It supports multiple
operating systems incluidng Linux, Mac OS X, FreeBSD and Solaris.
Was developed with usage of Rsubread.
The package is designed to classify gene expression profiles.
- [scDD](https://bioconductor.org/packages/scDD) This package
implements a method to analyze single-cell RNA- seq Data utilizing
flexible Dirichlet Process mixture models. Genes with differential
distributions of expression are classified into several interesting
patterns of differences between two conditions. The package also
includes functions for simulating data with these patterns from
negative binomial distributions.
- [scone](https://bioconductor.org/packages/scone) SCONE is an R
package for comparing and ranking the performance of different
normalization schemes for single-cell RNA-seq and other
- [semisup](https://bioconductor.org/packages/semisup) This R
packages moves away from testing interaction terms, and move
towards testing whether an individual SNP is involved in any
interaction. This reduces the multiple testing burden to one test
per SNP, and allows for interactions with unobserved factors.
Analysing one SNP at a time, it splits the individuals into two
groups, based on the number of minor alleles. If the quantitative
trait differs in mean between the two groups, the SNP has a main
effect. If the quantitative trait differs in distribution between
some individuals in one group and all other individuals, it
possibly has an interactive effect. Implicitly, the membership
probabilities may suggest potential interacting variables.
- [sparseDOSSA](https://bioconductor.org/packages/sparseDOSSA) The
package is to provide a model based Bayesian method to characterize
and simulate microbiome data. sparseDOSSA's model captures the
marginal distribution of each microbial feature as a truncated,
zero-inflated log-normal distribution, with parameters distributed
as a parent log-normal distribution. The model can be effectively
fit to reference microbial datasets in order to parameterize their
microbes and communities, or to simulate synthetic datasets of
similar population structure. Most importantly, it allows users to
include both known feature-feature and feature-metadata correlation
structures and thus provides a gold standard to enable benchmarking
of statistical methods for metagenomic data analysis.
- [splatter](https://bioconductor.org/packages/splatter) Splatter is
a package for the simulation of single-cell RNA sequencing count
data. It provides a simple interface for creating complex
simulations that are reproducible and well-documented. Parameters
can be estimated from real data and functions are provided for
comparing real and simulated datasets.
- [STROMA4](https://bioconductor.org/packages/STROMA4) This package
estimates four stromal properties identified in TNBC patients in
each patient of a gene expression datasets. These stromal property
assignments can be combined to subtype patients. These four stromal
properties were identified in Triple negative breast cancer (TNBC)
patients and represent the presence of different cells in the
stroma: T-cells (T), B-cells (B), stromal infiltrating epithelial
cells (E), and desmoplasia (D). Additionally this package can also
be used to estimate generative properties for the Lehmann subtypes,
an alternative TNBC subtyping scheme (PMID: 21633166).
- [swfdr](https://bioconductor.org/packages/swfdr) This package
allows users to estimate the science-wise false discovery rate from
Jager and Leek, "Empirical estimates suggest most published medical
research is true," 2013, Biostatistics, using an EM approach due to
the presence of rounding and censoring. It also allows users to
estimate the proportion of true null hypotheses in the presence of
covariates, using a regression framework, as per Boca and Leek, "A
regression framework for the proportion of true null hypotheses,"
2015, bioRxiv preprint.
"TCGAbiolinksGUI: A Graphical User Interface to analyze cancer
molecular and clinical data. A demo version of GUI is found in
- [TCseq](https://bioconductor.org/packages/TCseq) Quantitative and
differential analysis of epigenomic and transcriptomic time course
sequencing data, clustering analysis and visualization of temporal
patterns of time course data.
- [timescape](https://bioconductor.org/packages/timescape) TimeScape
is an automated tool for navigating temporal clonal evolution data.
The key attributes of this implementation involve the enumeration
of clones, their evolutionary relationships and their shifting
dynamics over time. TimeScape requires two inputs: (i) the clonal
phylogeny and (ii) the clonal prevalences. Optionally, TimeScape
accepts a data table of targeted mutations observed in each clone
and their allele prevalences over time. The output is the TimeScape
plot showing clonal prevalence vertically, time horizontally, and
the plot height optionally encoding tumour volume during
tumour-shrinking events. At each sampling time point (denoted by a
faint white line), the height of each clone accurately reflects its
proportionate prevalence. These prevalences form the anchors for
bezier curves that visually represent the dynamic transitions
between time points.
- [treeio](https://bioconductor.org/packages/treeio) Base classes and
functions for parsing and exporting phylogenetic trees.
- [TSRchitect](https://bioconductor.org/packages/TSRchitect) In
recent years, large-scale transcriptional sequence data has yielded
considerable insights into the nature of gene expression and
regulation in eukaryotes. Techniques that identify the 5' end of
mRNAs, most notably CAGE, have mapped the promoter landscape across
a number of model organisms. Due to the variability of TSS
distributions and the transcriptional noise present in datasets,
precisely identifying the active promoter(s) for genes from these
datasets is not straightforward. TSRchitect allows the user to
efficiently identify the putative promoter (the transcription start
region, or TSR) from a variety of TSS profiling data types,
including both single-end (e.g. CAGE) as well as paired-end
(RAMPAGE, PEAT). Along with the coordiantes of identified TSRs,
TSRchitect also calculates the width, abundance and Shape Index,
and handles biological replicates for expression profiling.
Finally, TSRchitect imports annotation files, allowing the user to
associate identified promoters with genes and other genomic
features. Three detailed examples of TSRchitect's utility are
provided in the User's Guide, included with this package.
- [twoddpcr](https://bioconductor.org/packages/twoddpcr) The twoddpcr
package takes Droplet Digital PCR (ddPCR) droplet amplitude data
from Bio-Rad's QuantaSoft and can classify the droplets. A summary
of the positive/negative droplet counts can be generated, which can
then be used to estimate the number of molecules using the Poisson
distribution. This is the first open source package that
facilitates the automatic classification of general two channel
ddPCR data. Previous work includes 'definetherain' (Jones et al.,
2014) and 'ddpcRquant' (Trypsteen et al., 2015) which both handle
one channel ddPCR experiments only. The 'ddpcr' package available
on CRAN (Attali et al., 2016) supports automatic gating of a
specific class of two channel ddPCR experiments only.
- [wiggleplotr](https://bioconductor.org/packages/wiggleplotr) Tools
to visualise read coverage from sequencing experiments together
with genomic annotations (genes, transcripts, peaks). Introns of
long transcripts can be rescaled to a fixed length for better
visualisation of exonic read coverage.
NEWS from new and existing packages
There is too much NEWS to include here, see the full release
Deprecated and Defunct Packages
Seven software packages (seqplots, ssviz, stepwiseCM, segmentSeq,
EWCE, anamiR, IdMappingRetrieval) were marked as deprecated, to be
fixed or removed in the next release.
Nine previously deprecated software packages (coRNAi, saps, MeSHSim,
GENE.E, mmnet, CopyNumber450k, AtlasRDF, GEOsearch, pdmclass) were
removed from the release.
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
More information about the Bioc-devel