[BioC] easyRNAseq question

Akula, Nirmala (NIH/NIMH) [C] akulan at mail.nih.gov
Mon Jul 2 23:10:00 CEST 2012


Thank you Simon. I tried Ensemble GTF file with HTSeq and got ~50,000 genes for testing by DESeq. We filtered the genes with low counts and the resulting file had ~23,000 genes. The problem now is the QQ-plot is way above the expected. Please see the attachment.

Analysis pipeline: Tophat-HTSeq-DESeq

Any suggestions will be greatly helpful.

Thank you very  much.

Regards,
Nirmala

-----Original Message-----
From: Simon Anders [mailto:anders at embl.de] 
Sent: Thursday, May 31, 2012 2:31 AM
To: bioconductor at r-project.org
Subject: Re: [BioC] easyRNAseq question

Dear Nirmala

On 2012-05-27 02:25, Akula, Nirmala (NIH/NIMH) [C] wrote:
> I used HTSeq (similar to your geneModel method) which takes the counts 
> of disjoint exons for the genes. The problem with this method is that 
> too many reads are assigned to ambiguous category and sometimes total 
> number of reads that fall on disjoint exons are too few to get a valid 
> DESeq result. Using RefSeq genes the total number of genes counted by 
> HTSeq on my data is ~14000 whereas using the bestExon method we get 
> ~22000. Do you observe similar counts with your data?

It does not quite make sense that counting only for the best exons gives you more counts than counting for all exons.

Could it be that the issue with UCSC GTF files described here is the source of your problems:

https://stat.ethz.ch/pipermail/bioconductor/2012-April/044717.html

   Simon

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


More information about the Bioconductor mailing list