[BioC] RNAseq machine learning classifier

Steve Lianoglou lianoglou.steve at gene.com
Tue Jul 16 00:58:49 CEST 2013


Hi,

On Mon, Jul 15, 2013 at 2:42 PM, Michael Breen
<breenbioinformatics at gmail.com> wrote:
> Hi all,
> We have a large RNAseq data set. Apart from identifying differentially
> expressed genes with these data we are also interested in classification in
> terms of developing a pronostic and diagnostic classifier.
>
> Normally, our approach would utilize a machine learning classifier, as SVM,
> and typically proceed with a nested cross-validation approach.
>
>
> The vast majority of these programs and packages have been designed
> utilizing microarray data.
>
> Are there any reasonable biases which one should consider before using such
> already published approaches on RNAseq data?
>
> Do the distributions of the different data types matter at all?
>
> If so, does an application exist using an SVM taking into consideration
> RNAseq raw counts?

One approach would be to take the output from one of the variance
stabilizing transformations in DESeq2 as the input to your machine
learning approach.

See:

R> library(DESeq2)
R> ?varianceStabilizingTransformation

and the Section 7 of the DESeq2 vignette (count data transformations):

http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf

HTH,
-steve

--
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech



More information about the Bioconductor mailing list