[BioC] RNAseq machine learning classifier
Wolfgang Huber
whuber at embl.de
Wed Jul 17 20:49:43 CEST 2013
Hi Jianping
good point about the parameter-dependence (i.e. dataset-dependence) of the variance stabilising transformations (VST) in DESeq2.
However, once the typical coverage and noise characteristics of the RNA-Seq assay used are established, one can 'freeze' the VST parameters and then just use that for future samples.
As always, QC of new data for compliance with the expectations from the learned ('frozen') characteristics will be needed.
Best wishes
Wolfgang
On 17 Jul 2013, at 20:16, <jhua at tgen.org> wrote:
> This sounds an OK approach to me.
>
> One thing you might take into consideration is that the classifier design usually involves independent validation data. If you are going to validate your classifier with the same type of RNAseq data, in general you need to normalize/variance stabilize all of them in one cohort. But sometimes the validation data are not collected until I report really positive results on training data only, which end up with another round of full normalization, training, and testing...
>
> Jianping Hua, Ph. D.
> Research Assistant Professor
> Computational Biology Division
> Translational Genomics Research Institute (TGen)
>
>
>
>>
>> Steve!
>>
>> I was thinking along these same lines: estimating dispersions then using a
>> variance stabilizing transformation. However, I am not sure how proper this
>> approach is?
>>
>> Can anyone confirm the validity of this approach?
>>
>> Michael
>>
>>
>> On Mon, Jul 15, 2013 at 3:58 PM, Steve Lianoglou
>> <lianoglou.steve at gene.com>wrote:
>>
>>> Hi,
>>>
>>> On Mon, Jul 15, 2013 at 2:42 PM, Michael Breen
>>> <breenbioinformatics at gmail.com> wrote:
>>>> Hi all,
>>>> We have a large RNAseq data set. Apart from identifying differentially
>>>> expressed genes with these data we are also interested in classification
>>> in
>>>> terms of developing a pronostic and diagnostic classifier.
>>>>
>>>> Normally, our approach would utilize a machine learning classifier, as
>>> SVM,
>>>> and typically proceed with a nested cross-validation approach.
>>>>
>>>>
>>>> The vast majority of these programs and packages have been designed
>>>> utilizing microarray data.
>>>>
>>>> Are there any reasonable biases which one should consider before using
>>> such
>>>> already published approaches on RNAseq data?
>>>>
>>>> Do the distributions of the different data types matter at all?
>>>>
>>>> If so, does an application exist using an SVM taking into consideration
>>>> RNAseq raw counts?
>>>
>>> One approach would be to take the output from one of the variance
>>> stabilizing transformations in DESeq2 as the input to your machine
>>> learning approach.
>>>
>>> See:
>>>
>>> R> library(DESeq2)
>>> R> ?varianceStabilizingTransformation
>>>
>>> and the Section 7 of the DESeq2 vignette (count data transformations):
>>>
>>>
>>> http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf
>>>
>>> HTH,
>>> -steve
>>>
>>> --
>>> Steve Lianoglou
>>> Computational Biologist
>>> Bioinformatics and Computational Biology
>>> Genentech
>>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list