[BioC] Shortread and filtering of duplicate reads
Martin Morgan
mtmorgan at fhcrc.org
Tue Jan 19 02:08:45 CET 2010
Hi Johannes --
Johannes Waage wrote:
> Hi all,
>
> Does anyone know if the shortRead package has functionality to filter out
> duplicate reads, but only reads with more than n duplicates, to avoid reads
> stacks caused by PCR-aplification? I can only find srduplicated(), but it
> doesn't seem to have functionality for specifiying n duplicate reads.
I don't think there's a built-in function. This
f <- function(x, n)
{
r <- srrank(x)
t <- tabulate(r)
r %in% which(t >= n)
}
returns a logical vector indicating that the reads occur >= n times, so
aln[!f(sread(aln), 5)]
would drop the reads occurring 5 or more times (one might want to think
about whether the reads need to map to the same location, too).
Martin
>
> Thanks in advance!
>
> Regards,
> JW,
> Uni. of Copenhagen
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list