[Bioc-sig-seq] removing non-unique sequences and uniqueFilter

Martin Morgan mtmorgan at fhcrc.org
Thu May 14 17:54:14 CEST 2009


Hi Tobias --

Tobias Straub <tstraub at med.uni-muenchen.de> writes:

> hi
>
> i would like to remove all non-unique sequences from a AlignedRead
> object. i thought that the uniqueFilter would help me to do so. in
> fact, the filter removes a considerable amount of reads, but when i
> call tables on the result object i still have lots of sequences
> occuring more than once.
> did i miss something?

The challenge is in defining what 'unique' is. From the help page
?uniqueFilter

     uniqueFilter(withSread=TRUE, .name="UniqueFilter")

and

withSread: A 'logical(1)' indicating whether uniqueness includes the
          read sequence ('withSread=TRUE') or is based only on
          chromosome, position, and strand ('withSread=FALSE').
     
so uniqueFilter by default looks for reads that are identical in terms
of the actual sequence, and are also identical in terms of chromosome,
position, and strand of alignment. 'tables' is based on just the
reads. If you wanted to make the reads unique, based only on sequence
identity, you could do something like

  aln[!srduplicated(aln)]

Martin

> thanks in advance
> Tobias
>
> ----------------------------------------------------------------------
> Tobias Straub   ++4989218075439   Adolf-Butenandt-Institute, München D
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list