[BioC] DeSeq vs current version of Cuffdiff

Nicolas Delhomme delhomme at embl.de
Tue Feb 14 16:01:37 CET 2012


Dear  Stephen,

To your related note, you could have a look at the easyRNASeq package (bioC 2.10) for R (2.15). It reads in your annotation, your bam files and generate a count table for DESeq, all in R. It can actually do the first step of DESeq (estimating size library and dispersion) and give you back a normalized countDataSet object plus some validation plots as described in the DESeq vignettes (the same is true for edgeR). I'm about to push some changes in SVN to correct an issue that prevented the package vignette to be build. The package should be available in a couple of days as binary or you could install it directly from SVN. See http://wiki.fhcrc.org/bioc/SvnHowTo. The package URL is: https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/easyRNASeq.

Getting the proper set of annotation is definitely the most important step in the whole process and the one that requires most attention, the rest is then pretty straightforward. What I mean by annotation is the description of your feature of interest, be it gene, transcript, exon, enhancers, etc... as genomic loci (chr, start, width, etc...). 
The main issue in defining the annotation is to avoid counting reads multiple times and the kind of annotation needed does of course depends on your project. If you are interested in looking at isoforms differential expression, you probably want to define synthetic exons (to avoid double counting) and process the obtained count table with DEXseq. If you're looking at gene expression, you would want to create gene models to avoid multiple counting and use these to create your count table. If you are interested in eRNAs, you can define enhancer loci as the count "feature". All this can be done relatively easily in R. Once you have the proper annotation that suits your need, running easyRNASeq is very straightforward. easyRNASeq accepts both RangedData and GRangesList as annotation input, among other formats. easyRNASeq is in addition able to fetch annotations for you from different sources, but most of the time these would need to be post-processed. You can look at my post: "using easyRNASeq examples" from 2 days ago for some examples and comments in addition to the vignette content.

Cheers,

Nico

---------------------------------------------------------------
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------





On 14 Feb 2012, at 15:02, Stephen Turner wrote:

> I'm also in the same boat as Rich. I run a new bioinformatics core
> here and I'm building a pipeline for RNA-seq. Cufflinks for some time
> has supported biological replicates, and I'm also curious about the
> relative merits of using
> bowtie/tophat-cufflinks-cuffmerge-cuffdiff-cummeRbund versus using
> tophat-HTSeq?-customScriptForCreatingMatrix?-DESeq. Cufflinks also
> gives me a host of other tests (differential splicing load,
> differential TSS usage, differential coding output, etc), which also
> seem useful for certain applications.
> 
> On a related note, does anyone have a workflow for taking multiple bam
> files, running HTSeq-count (or another program), plus some other
> program or custom script, to produce a matrix of counts as input to
> DESeq?
> 
> Stephen
> 
> -----------------------------------------
> Stephen D. Turner, Ph.D.
> bioinformatics at virginia.edu
> Bioinformatics Core Director
> University of Virginia School of Medicine
> bioinformatics.virginia.edu
> 
> On Tue, Feb 14, 2012 at 6:00 AM, <bioconductor-request at r-project.org> wrote:
>> 
>> Message: 6
>> Date: Mon, 13 Feb 2012 09:28:10 -0800
>> From: "Tim Triche, Jr." <tim.triche at gmail.com>
>> To: Richard Friedman <friedman at cancercenter.columbia.edu>
>> Cc: Bioconductor mailing list <bioconductor at r-project.org>
>> Subject: Re: [BioC] DeSeq vs current version of Cuffdiff
>> Message-ID:
>>        <CAC+N9BWr30EvfZ5rH7hpNF-k9uQ=B4n_SqGr0B49XGmW89JBmg at mail.gmail.com>
>> Content-Type: text/plain
>> 
>> Not directly relevant to gene-level RNA-seq DE calls, but rather for
>> exon-level DE,
>> I found it useful to read this:
>> http://precedings.nature.com/documents/6837/version/1
>> In particular, section 4.3 on page 11, and supplementary figures S7 and S8
>> on page 19.
>> 
>> I was informed by a coworker that since everyone uses
>> BowTie-TopHat-Cufflinks-Cuffdiff, it is the sensible thing to do.
>> Conversations with people who know what they are doing (Terry Speed &
>> BCGSC) suggest the matter is not yet settled.
>> So I retrieved ~1TB of BAMs, extracted the reads, and started looking into
>> how that compares to DEXSeq and/or subread.
>> 
>> It would be incredibly informative if the Cufflinks and DEXSeq authors had
>> time to weigh in on their strengths/weaknesses. DEXSeq & cummeRbund both
>> offer nice tools for exploring the results; I am curious which pipeline
>> fits best for my needs.
>> 
>> Thanks for bringing this up.
>> 
>> 
>> On Mon, Feb 13, 2012 at 8:01 AM, Richard Friedman <
>> friedman at cancercenter.columbia.edu> wrote:
>> 
>>> Dear Bioconductor list,
>>> 
>>>        Sometime ago Simon Anders explained the difference
>>> between DeSeq and Cuffdiff as follows:
>>> 
>>> "If you have two samples, cuffdiff tests, for each transcript, whether
>>> there is evidence that the concentration of this transcript is not the
>>> same in the two samples.
>>> 
>>> If you have two different experimental conditions, with replicates for
>>> each condition, DESeq tests, whether, for a given gene, the change in
>>> expression strength between the two conditions is large as compared to
>>> the variation within each replicate group."
>>> 
>>> Current language on the Cuffdiff site suggests that the current version
>>> of that program  tests for whether the change is significant compared to
>>> changes in each condition.
>>> 
>>> http://cufflinks.cbcb.umd.edu/**howitworks.html#hdif<http://cufflinks.cbcb.umd.edu/howitworks.html#hdif>
>>> 
>>> http://cufflinks.cbcb.umd.edu/**howitworks.html#reps<http://cufflinks.cbcb.umd.edu/howitworks.html#reps>
>>> 
>>> Can someone please comment on the relative merits of Cuffdiff and
>>> DeSeq. I ask here because our sequencing core delivers results
>>> based on Cuffdiff and I want to know if I should redo it using
>>> DeSeq,I would greatly appreciate any guidance in this matter.
>>> 
>>> Thanks and best wishes,
>>> Rich
>>> ------------------------------**------------------------------
>>> Richard A. Friedman, PhD
>>> Associate Research Scientist,
>>> Biomedical Informatics Shared Resource
>>> Herbert Irving Comprehensive Cancer Center (HICCC)
>>> Lecturer,
>>> Department of Biomedical Informatics (DBMI)
>>> Educational Coordinator,
>>> Center for Computational Biology and Bioinformatics (C2B2)/
>>> National Center for Multiscale Analysis of Genomic Networks (MAGNet)
>>> Room 824
>>> Irving Cancer Research Center
>>> Columbia University
>>> 1130 St. Nicholas Ave
>>> New York, NY 10032
>>> (212)851-4765 (voice)
>>> friedman at cancercenter.**columbia.edu <friedman at cancercenter.columbia.edu>
>>> http://cancercenter.columbia.**edu/~friedman/<http://cancercenter.columbia.edu/~friedman/>
>>> 
>>> I am a Bayesian. When I see a multiple-choice question on a test and I
>>> don't
>>> know the answer I say "eeney-meaney-miney-moe".
>>> 
>>> Rose Friedman, Age 14
>>> 
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list