[Bioc-sig-seq] Dealing with pileups/duplicates in RNAseq

Sean Davis seandavi at gmail.com
Fri Apr 23 21:50:13 CEST 2010


On Fri, Apr 23, 2010 at 12:50 PM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi all,
>
> Sorry for abusing the list (and *-seq terminology) as this isn't
> really a Bioconductor-related question, but I was curious how you all
> deal with "pileups" in RNAseq data. By pileup I mean separate
> observations of the same read (ie. two++ different reads that map to
> the same exact genomic locus), aka duplicate reads.
>
> I'm pretty sure it's common practice to remove them in ChIP-seq
> experiments since, I believe, they are usually assumed to be PCR
> artifacts, but with genes being able to vary in their expression
> level, removing all of them probably isn't a given.
>
> That having been said, I have been removing them anyway. I think I've
> seen some references to only keep N-many reads that map to the same
> place, where N seems to be arbitrarily chosen at a global scale.
>
> I guess it makes the most sense to probably determine N on a
> gene-by-gene basis, perhaps by quantifying the expression of the gene
> based on its uniquely-appearing reads, though.
>
> So, I'm just curious if/how you folks are tackling this issue.

Probably depends on your use case.  For finding fusion transcripts and
SNVs, for example, it might be best to remove duplicates.  However,
for gene expression, it is hard to justify doing so blindly.  Just my
$0.02.

Sean



More information about the Bioc-sig-sequencing mailing list