[BioC] easyRNAseq question

Sun May 27 02:25:57 CEST 2012

Dear Nico,

I used HTSeq (similar to your geneModel method) which takes the counts of disjoint exons for the genes. The problem with this method is that too many reads are assigned to ambiguous category and sometimes total number of reads that fall on disjoint exons are too few to get a valid DESeq result. Using RefSeq genes the total number of genes counted by HTSeq on my data is ~14000 whereas using the bestExon method we get ~22000. Do you observe similar counts with your data?

I would like to mention that a QQ-plot of the DESeq results using bestExon method is close to the expected.

Let me know your thoughts on this.

Thank you very much.

Best Regards,
Nirmala

________________________________________
From: Nicolas Delhomme [delhomme at embl.de]
Sent: Saturday, May 26, 2012 6:51 AM
To: Akula, Nirmala (NIH/NIMH) [C]
Cc: bioconductor at r-project.org
Subject: Re: easyRNAseq question

Dear Nirmala,

The BestExon works similarly to your workflow. Per gene, the count for the exon having the highest coverage is returned.

There are several reasons why I want to deprecate that function, the main two being:

1) It compares worse to microarray expression values than the geneModels approach.
2) RNA-Seq has a clear sequencing bias, i.e. the coverage of an exon is depending on many factors, both biological and technical, e.g. GC content, RNA fragmentation protocol, etc. This implies that the coverage varies within exon and across exon. Selecting a single exon introduces additional uncertainties, which are otherwise leveled across the gene's exons. That should not affect a direct comparison between samples, as the sequencing bias is highly reproducible from one sample to the next.

So, as using a best exon approach offers no advantage over a gene model approach, I'd advise you to choose that last one.

Best,

Nico

---------------------------------------------------------------
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------

On May 25, 2012, at 11:07 PM, Akula, Nirmala (NIH/NIMH) [C] wrote:

> Dear Nico,
>
> Thank you very much for your response. I did read the sections that you mentioned but would like to know more details about the BestExon method. Here is what I currently have:
>
> 1. Map reads with TopHat
> 2. Create a bed file from the bam (each read is represented by only one base which is its starting position to make sure that the read does not fall on two different exons)
> 3. Use coverageBed to get the counts reads on each exon
> 4. For gene-level differential expression: Take only one exon/gene that has the maximum number of reads
> 5. Analyze the reads in DESeq
>
> I would like to compare the above method to the BestExon method in easyRNAseq.
>
>
> Best,
> Nirmala
>
>
>
>
> -----Original Message-----
> From: Nicolas Delhomme [mailto:delhomme at embl.de]
> Sent: Friday, May 25, 2012 3:03 AM
> To: Akula, Nirmala (NIH/NIMH) [C]
> Cc: bioconductor at r-project.org
> Subject: Re: easyRNAseq question
>
> Dear Nirmala,
>
> I've Cc'ed your email to the Bioconductor mailing list, as it might help other users.
>
> Yes, there is currently a manuscript in review.
>
> As I'm not sure where you got your information from about the GeneModel summarization, I would direct you to read the new vignette of the development package: http://bioconductor.org/packages/2.11/bioc/html/easyRNASeq.html, page 10 and section 4.6. If that's what you've done or if the information there is not sufficient, let me know and I'll detail it more. By the way, the BestExon summarization did not really prove useful on the datasets I've been working on. I'm thinking about deprecating it.
>
> Best,
>
> Nico
>
> ---------------------------------------------------------------
> Nicolas Delhomme
>
> Genome Biology Computational Support
>
> European Molecular Biology Laboratory
>
> Tel: +49 6221 387 8310
> Email: nicolas.delhomme at embl.de
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
> ---------------------------------------------------------------
>
>
>
>
>
> On May 24, 2012, at 11:46 PM, Akula, Nirmala (NIH/NIMH) [C] wrote:
>
>> Dear Nicolas,
>>
>> Is there a publication that is available for easyRNAseq software? Also, you have mentioned that the transcripts are collapsed to genes by BestExon method and GeneModel summarization. Could you give details on these two methods?
>>
>> Thank you very much.
>>
>> Best Regards,
>> Nirmala
>>
>