[Bioc-sig-seq] (no subject)

Tue Jan 20 18:41:14 CET 2009

Hi,

I'm currently comparing the alignment generated by bowtie and maq on the data
used in the HilbertVis pdf file of the ShortRead package (NCBI SRA accession:
SRA000206). From the original .seq and .prb solexa files, I have generated the
fastq files and aligned them to the ref. genome using bowtie and maq.

I load both map files in R using readAligned. Then when filtering my reads, I
realized that most of the default filter functions have been written for maq
and are therefore not necessarily available for bowtie.
I'm actually interested in the alignQualityFilter in order to sort those reads
which map several times in the genome (as described in the page 22 of the "I/0
and Quality Assessment using ShortRead" of last November bioC tutorial
session). But, this method relies on the quality slot of the object returned by
the alignQuality accessor, which is NA in the case of a bowtie alignment. This
is actually normal, as the final quality mapping value stored in this object is
only available in the maq map file and now in the bowtie one.

So I'd just would like to know if there would be another way to filter out the
reads mapping multiple times in the genome when using an AlignedRead object
obtained from a bowtie map file.

Sorry that this sounds quite confusing... Here is the actual code and what
happens:

filt <- alignQualityFilter(threshold=1)

# works
aln<-readAligned( "H3K4me1", "run13_lane4.map", type="MAQMap", filter=filt)

# fails
aln2<-readAligned( "H3K4me1.bowtie", "run13_lane4.map", type="Bowtie",
filter=filt)

Error in x at ranges[i] : subscript contains NAs

Any ideas how to filter a bowtie type alignment for reads mapping multiple time
in the genome?

Cheers,

N.

> sessionInfo()
R version 2.9.0 Under development (unstable) (2009-01-04 r47472)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] HilbertVis_1.1.4   ShortRead_1.1.33   lattice_0.17-20    BSgenome_1.11.8
[5] Biobase_2.3.9      Biostrings_2.11.22 IRanges_1.1.33

loaded via a namespace (and not attached):
[1] grid_2.9.0         Matrix_0.999375-17 tools_2.9.0