[Bioc-sig-seq] Filtering Eland aligned reads on input to ReadAligned

Martin Morgan mtmorgan at fhcrc.org
Wed Sep 22 20:21:49 CEST 2010


On 09/22/2010 10:55 AM, pterry at huskers.unl.edu wrote:
>  Dear bioc-sig-sequencing,
> 
> In comparing two approaches for filtering Eland aligned reads when inputing the data with ReadAligned, I get an approximately 30% difference in the number of reads surviving.  So my question: which approach should I use, or some other combination of functions?
> 
> Roughly following the BioC2010 lab (http://www.bioconductor.org/help/course-materials/2010/BioC2010/Workflow.pdf), the two approaches and the number of reads resulting follow (note: 1380439 lines/reads in input file)
> 
> 
>> filt1 <- alignDataFilter(expression(filtering=="Y"))

Hi --

I guess the alignDataFilter() is the main difference, removing reads
that do not have a 'Y' to indicate that they pass Illumina's own read
quality (_not_ based on alignment) criterion. I guess these are reads
that Illumina isn't confident in, but that nonetheless align to the
genome. It might pay to read some of the data in and explore the
consequences of each of the filters independently...

Martin

>> filt2 <- chromosomeFilter("chr[0-9XYM]+.fas")
>> filt3 <- occurrenceFilter(withSread = FALSE)
>> filt <- compose(filt1, filt2, filt3)
>> arabtest <- seqapply(fls, function(file) {
> +   as(readAligned(file, type="SolexaExport", filter=filt), "GRanges")
> + })
>> arabtest
> GRangesList of length 1
> [[1]]
> GRanges with 966869 ranges and 7 elementMetadata values
> 
> Alternatively, (from page 1 of the lab previously referenced):
> 
>> filt <- compose(chipseqFilter(), alignQualityFilter(15))
>> arabtest <- seqapply(fls, function(file) {
> +   as(readAligned(file, type="SolexaExport", filter=filt), "GRanges")
> + })
>> arabtest
> GRangesList of length 1
> [[1]]
> GRanges with 1286501 ranges and 7 elementMetadata values
> 
> 
> Thanks,
> P. Terry
> pterry at huskers.unl.edu
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



More information about the Bioc-sig-sequencing mailing list