[BioC] Shortread and filtering of duplicate reads
mtmorgan at fhcrc.org
Tue Jan 19 02:08:45 CET 2010
Hi Johannes --
Johannes Waage wrote:
> Hi all,
> Does anyone know if the shortRead package has functionality to filter out
> duplicate reads, but only reads with more than n duplicates, to avoid reads
> stacks caused by PCR-aplification? I can only find srduplicated(), but it
> doesn't seem to have functionality for specifiying n duplicate reads.
I don't think there's a built-in function. This
f <- function(x, n)
r <- srrank(x)
t <- tabulate(r)
r %in% which(t >= n)
returns a logical vector indicating that the reads occur >= n times, so
would drop the reads occurring 5 or more times (one might want to think
about whether the reads need to map to the same location, too).
> Thanks in advance!
> Uni. of Copenhagen
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor