[Statlist] Bayesian analysis of RNA sequencing data [...] - Zurich - Mark van de Wiel - 21-22.02

Mon Feb 18 08:45:29 CET 2013

Dear Statlist,

Just a note and/or reminder that Mark van de Wiel (VU University Medical Center, Amsterdam) will visit Zurich Thursday/Friday this week.

He will give a talk at the "ZüKoSt: Seminar on applied Statistics" (http://stat.ethz.ch/events/zukost) on Thursday afternoon (16:15-17:30; ETHZ HG G 19.2); abstract below.

You can find more info about Professor van de Wiel at:
http://www.few.vu.nl/~mavdwiel/

If you would like to speak with him during his visit, please let me know ASAP.

Best regards, Mark

-------------
Abstract:	Next generation sequencing is quickly replacing microarrays as a technique to probe different molecular levels of the cell, such as DNA or mRNA. The technology has the advantage to provide higher resolution, while reducing biases, in particular at the lower end of the spectrum. mRNA sequencing (RNAseq) data consist in counts of pieces of RNA called tags. This type of data imposes new challenges for statistical analysis. We present a novel approach to model and analyze these data. 

Method and softwares for differential expression analysis usually use a generalization of the Poisson or Binomial distribution that accounts for overdispersion. A popular choice is the negative binomial (i.e. Poisson-Gamma) model. However, there is no consensus on what model fits best to RNAseq data, and this may depend on the technology used. With RNAseq, the number of features vastly exceeds the sample size. This implies that shrinkage of variance-related parameters may lead to more stable estimates and inference. Methods to do so are available, but only for a single parameter and in the context of restrictive study designs, e.g. two-group comparisons or fixed-effect designs. 

We present a Bayesian framework that allows for a) various count models b) flexible designs c) random effects and d) multi-parameter shrinkage. The latter is implemented using Empirical Bayes principles by several procedures that estimate hyper-parameters of (mixture) priors or nonparametric priors. Moreover, the framework provides Bayesian multiplicity correction, thereby providing solid inference. In data-based simulations, we show that our method outperforms other popular methods (edgeR, DESeq, baySeq, NOISeq). Moreover, we illustrate our approach on three data sets. The first is a CAGE data set containing 25 samples representing five regions of the human brain from seven individuals. The design is incomplete and a batch effect is present. The data motivates use of the zero-inflated negative binomial as a powerful alternative to the negative binomial, because it leads to less bias of the overdispersion parameter and improved detection power for the low-count tags. The second is a large, standard two-sample RNAseq data set that we repeatedly split into a small data set and its large complement. Compared to other methods, our results from the small sample data sets validate much better on their large sample complements, illustrating the importance of the type of shrinkage. 

The methodology and these results are available in Van de Wiel et al. (2012). 

The framework is not restricted to RNAseq data nor to differential expression analysis. It is currently being extended towards analysis of proteomics, microRNAs, methylation, and high-throughput screening data. In addition, we currently study multivariate, graphical applications using Bayesian ridge regression. If time permits, some of these extensions will be discussed. The R software package, termed `ShrinkBayes', is build upon INLA, which provides the machinery for computing marginal posteriors in a variety of models. 

Co-authors: 
Gwenael Leday (i), Luba Pardo (iii), Havard Rue (iv), Aad van der Vaart (ii), Wessel van Wieringen (i,ii) 

Affiliations: 
i. Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam 
ii. Department of Mathematics, VU University, Amsterdam 
iii. Department of Clinical Genetics, VU University Medical Center, Amsterdam 
iv. Department of Mathematical Sciences, Norwegian University for Science and Technology, 
Trondheim, Norway 

Reference: 
Van de Wiel MA, Leday GGR, Pardo L, Rue H, Van der Vaart AW, Van Wieringen WN (2012). Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics, 14, 113-128 

Speakers:	
Mark van de Wiel (VU University Medical Center, Amsterdam)
-------------

Best regards, Mark

----------
Prof. Dr. Mark Robinson
Bioinformatics, Institute of Molecular Life Sciences
University of Zurich
http://tiny.cc/mrobin