[Bioc-sig-seq] assess how many duplicated reads

Martin Morgan mtmorgan at fhcrc.org
Fri Aug 12 05:34:03 CEST 2011


On 08/11/2011 09:50 AM, Kunbin Qu wrote:
> Hi, I have some human single end RNA-seq runs on HiSeq. Can I have
> some suggestions on how to assess how many duplicated reads out of
> these libraries? I looked around srFilter() in ShortRead, but have
> not had a clear thought on how to implement it? Should I use IRanges
> as an alternative to assess the unique starting site after the
> mapping? If so, what function do you suggest? I'd like to count reads
> which map to the same location (even with some mismatches) as
> duplicates. Thanks.

ShortRead::tables() could be used for exactly identical unaligned reads. 
ShortRead::occurrenceFilter is an implementation for non-gapped, aligned 
reads. For aligned reads with gaps I think you're on your own, but maybe 
GRanges::readGappedAlignments or Rsamtools::scanBam + the logic of 
ShortRead::occurrenceFilter would be a starting point. Perhaps your 
aligner has already flagged duplicate reads, in which case the 'flag' 
field available in scanBamParam and scanBam would be helpful.

Hope that is of some help.

Martin


>
> -Kunbin
>
>
>
> ______________________________________________________________________
>
>
The contents of this electronic message, including any attachments, are 
intended only for the use of the individual or entity to which they are 
addressed and may contain confidential information. If you are not the 
intended recipient, you are hereby notified that any use, dissemination, 
distribution, or copying of this message or any attachment is strictly 
prohibited. If you have received this transmission in error, please send 
an e-mail to postmaster at genomichealth.com and delete this message, along 
with any attachments, from your computer.
> [[alternative HTML version deleted]]
>
> _______________________________________________ Bioc-sig-sequencing
> mailing list Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-sig-sequencing mailing list