[BioC] multi-mapped reads: Cufflinks + baySeq? edgeR?
Ryan C. Thompson
rct at thompsonclan.org
Fri Mar 1 19:42:14 CET 2013
Hi Hilary,
If you want to include multi-mapped reads in your counts, I believe
that the latest version of Cufflinks reports an estimate of the counts
for each transcript/gene in addition to FPKM. See the Cufflinks papers
& manual to see how it splits the counts for multi-mapping reads.
However, if you decide to use count estimates based on multi-mapping
reads in edgeR or baySeq, be aware that you are feeding them estimates
with some degree of uncertainty, while they are expecting exact raw
counts. As a result, the uncertainty in the count estimates will be
ignored by edgeR & baySeq, leading to an underestimate of variability
and an overestimate of significance. If most of your reads are uniquely
mapped, then this effect is probably quite modest. You should compare
the multi-mapped counts to the unique-only counts from htseq to see
what fraction of your reads are counted as multi-mapped. The higher
this fraction, the greater your risk of overstating the significance of
differential expression calls, but in truth no one really knows how big
an effect this might have, or whether it negates the benefit of
including the additional data from multi-mapping reads.
Hope this helps,
-Ryan
On Fri 01 Mar 2013 06:51:33 AM PST, Hilary [guest] wrote:
>
> Hi,
> I am trying to analyze RNA-Seq data for (gene-level) differential expression between treatments, in a design incorporating multiple factors (effects of species * treatment & interaction, 4 replicates for each combination). I have reads that map to multiple locations (Single-End data) and while I'd first used Bowtie2/Tophat >> htseq (discarding multi-mapping reads = multihits in htseq), and then used the GLM and baySeq approaches, it was suggested I go back and include multi-mapping hits.
>
> I know Cufflinks allows incorporation of the multi-mapping reads (Mortazavi method I think), and I know it is not compatible with the GLM methods of edgeR/DEseq due to use of FPKM but does that incompatibility apply to baySeq as well?
>
> Using CuffDiff seems problematic as it only does pairwise tests -- and while I can do that, I think a full model testing for individual effects and their interaction (esp. as our real interest is the species*treatment interaction) is probably more statistically accurate. Thus I'm not sure how to proceed; any suggestions would be greatly appreciated if someone has time!
> Thank you,
> Hilary
>
> -- output of sessionInfo():
>
> ()
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list