[Bioc-sig-seq] Package ShortRead: occurenceFilter and chromosomeFilter("NM")

Ivan Gregoretti ivangreg at gmail.com
Mon May 16 16:11:09 CEST 2011


Hi Valerie.

That means that occurrenceFilter works strictly as advertised.

That is fine. Thank you.

Ivan


Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1016 and 1-301-496-1592
Fax: 1-301-496-9878



On Fri, May 13, 2011 at 5:55 PM, Valerie Obenchain <vobencha at fhcrc.org> wrote:
> Hi Ivan,
>
> How you filter depends on what type of file you have. If you have a bam
> file, you can use scanBam and ScanBamParam to subset the reads before they
> are read in.
>     p <- ScanBamParam(flag=scanBamFlag(isUnmappedQuery=TRUE,
> isDuplicate=FALSE))
>     unaligned <- scanBam(file, params=p)
>
> If you have a Solexa export file you will have to remove the aligned reads
> after you read the data in. The occurrenceFilter can be used to omit
> duplicates but not to select unaligned reads.
>     exptPath <- system.file("extdata", package = "ShortRead")
>     sp <- SolexaPath(exptPath)
>     aln <- readAligned(sp, "s_2_export.txt",
> filter=occurrenceFilter(duplicates="none"))
>
> The "NA" values end up in the chromosome slot of the AlignedRead object.
> Subset on the unaligned reads,
>     unaligned <- aln[chromosome(aln) == "NM"]
>
>
> Valerie
>
>
>
>
>
> On 05/13/2011 10:30 AM, Ivan Gregoretti wrote:
>
> Hello everyone,
>
> When loading reads with ShortRead::readAligned(), one has the great
> convenience of filtering the input.
>
> If I intend to load only unique sequences that are unaligned ("NM"),
> can I use occurenceFilter?
>
> By _unique_ I mean that if the sequence ATCTCATAGTGGG has been loaded
> once, I do not want to load it the next time it is found.
>
>
> ?srFilter clearly documents how aligned reads are occurenceFilter-ed,
> however, unaligned reads are not mentioned.
>
> Thank you,
>
> Ivan
>
>
> Ivan Gregoretti, PhD
> National Institute of Diabetes and Digestive and Kidney Diseases
> National Institutes of Health
> 5 Memorial Dr, Building 5, Room 205.
> Bethesda, MD 20892. USA.
> Phone: 1-301-496-1016 and 1-301-496-1592
> Fax: 1-301-496-9878
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>



More information about the Bioc-sig-sequencing mailing list