[Bioc-sig-seq] assess how many duplicated reads

Thu Aug 11 19:34:50 CEST 2011

The picard library (java-based) is a very useful library for doing
this type of thing.  This can be done in R, but the picard folks have
put a lot of thought into how to find and mark duplicates including
optical duplicates.  This is particularly true if you have paired-end
data.

Sean

On Thu, Aug 11, 2011 at 12:50 PM, Kunbin Qu <KQu at genomichealth.com> wrote:
> Hi, I have some human single end RNA-seq runs on HiSeq. Can I have some suggestions on how to assess how many duplicated reads out of these libraries? I looked around srFilter() in ShortRead, but have not had a clear thought on how to implement it? Should I use IRanges as an alternative to assess the unique starting site after the mapping? If so, what function do you suggest? I'd like to count reads which map to the same location (even with some mismatches) as duplicates. Thanks.
>
> -Kunbin
>
>
>
> ______________________________________________________________________
> The contents of this electronic message, including any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain confidential information. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this message or any attachment is strictly prohibited. If you have received this transmission in error, please send an e-mail to postmaster at genomichealth.com and delete this message, along with any attachments, from your computer.
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>