[Bioc-sig-seq] Rsamtools problem reading seq information

ivan.borozan at utoronto.ca ivan.borozan at utoronto.ca
Thu Nov 11 17:25:38 CET 2010


Hello list,

I have scanned a large bam (15G) file from Bioscope (SOLID) using  
Rsamtools and the code below:

> library(Rsamtools)
Loading required package: GenomicRanges
>
>
> p4<- ScanBamParam(what = c("seq"), flag =  
> scanBamFlag(isUnmappedQuery = TRUE))
>
> res3 <- scanBam("test.bam",param=p4, maxMemory=5000)[[1]]


it is not clear to me why I get all sequences as


> res3$seq[1]

  A DNAStringSet instance of length 1
     width seq
[1]     1 N


and all Phred-encoded, phred-scaled base quality scores as:


> p4<- ScanBamParam(what = c("qual"), flag =  
> scanBamFlag(isUnmappedQuery = TRUE))
>
>
> res3 <-  
> scanBam("solid0085_20090610_ICGC_Xeno_wholetranscrptome_4041X_F3_sortedByReadId.bam",param=p4,  
> maxMemory=5000)[[1]]
>



> res3$qual[1]
   A PhredQuality instance of length 1
     width seq
[1]     1 !


Many thanks for any suggestions,

Ivan


> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_CA.UTF-8
  [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Rsamtools_1.2.0     GenomicRanges_1.2.0 Biostrings_2.18.0
[4] IRanges_1.8.0

loaded via a namespace (and not attached):
[1] Biobase_2.10.0 tools_2.12.0



More information about the Bioc-sig-sequencing mailing list