[BioC] DESeq on transcripts v/s genes

Sun Feb 5 15:59:31 CET 2012

A clarification (after off-list request): there are two possibilties for 
double counting, and with below post I'm refering to only one of them:

1. Creating a transcript-level count for each possible transcript of a 
gene, essentially by *treating each transcript as a separate 'gene'*, 
and then calling DESeq or analgous. This is what the below post refers to.

2. Counting the reads touching each exon, and then *summing these 
numbers up over all exons of a gene* to get a per-gene (or per 
transcript) value. That would be wrong, since then those reads that 
touch more than one exon are multiply counted and mess up the 
statistical model.

	Best wishes
	Wolfgang

Feb/5/12 12:16 PM, Wolfgang Huber scripsit::
> Dear Abishek
>
> there was some anxiety regarding double-counting / redundancy in this
> thread. Actually, there is very little reason to worry. DESeq tests
> sequentially one hypothesis after the other. It does not matter whether
> they are correlated or not.
>
> The one consideration where the correlations / redundancy can matter is
> multiple testing correction. As long as you go for FDR, again there is
> little to worry, since the redundancy pops up both in the numerator and
> denominator of the ratio (the "R" in FDR) and at least to good enough
> approximation cancels out.
>
> If you go for family-wise error rate (FWER) and, say, Bonferroni
> correction, then the redundancy and the increase in number of tests do
> matter. But there seem few reasons to use FWER/Bonferroni in this context.
>
> Hope this helps
> Wolfgang
>
> Feb/2/12 12:46 AM, Abhishek Pratap scripsit::
>> Hi All
>>
>> I am wondering if conceptually I can use the DESeq to test for
>> differential
>> transcript expression compared to genes. In our case we have generated a
>> transcript model based on RNA-Seq and if we try to collapse those
>> transcripts to genes in order to do gene level differential expression
>> many
>> exons are collapsed to give rise to artificial exons.
>>
>>
>> eg :
>>
>>
>> Transcript 1 : ---------------------- (exon)
>> Transcript 2 : -----------------------------(exon )
>>
>> Gene level : -------------------------------------------- (exon)
>>
>> Also another thing that comes to my mind if the effect of double counting
>> if I take the read counts at transcript level due to exon redundancy.
>>
>> I would love to hear from your experience.
>>
>> Thanks!
>> -Abhi
>>
>> [[alternative HTML version deleted]]
>>
>
> Best wishes
> Wolfgang
>
> Wolfgang Huber
> EMBL
> http://www.embl.de/research/units/genome_biology/huber
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Best wishes
	Wolfgang

Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber