[Bioc-devel] Problem in asBam from Rsamtools

rcaloger raffaele.calogero at gmail.com
Sat Jun 1 17:04:09 CEST 2013


Hi,
I am using the devel version of Bioconductor as part of the development 
of my package chimera.
Testing a new function in chimera, that uses Rsubread package, I 
encountered a problem in converting a sam file generated by Rsubread in 
a bam file.
I used the function asBam from Rsamtools and I got the following error:

In doTryCatch(return(expr), name, parentenv, handler) :
   Parse error at line 14667325: sequence and quality are inconsistent

I managed to run asBam if I use only the sam file till line 14667324
Instead I get the above error if I use a sam file finishing at line 14667325

The line that create the problem is the following:

HWI-ST169:273:D0YW6ACXX:2:1201:4070:162856    141    *    0    0 *    
*    0    0 
AAAAAAGGGTTGAATTATTTTCACTTGCCCACGTAGTTTATGAATGTGGGAAATAGCTTCAAAGACAGATTAAATGATTTGCCCAAGGCCACAGAAAAGAG 
@@@FFFFFHABHHJGGBFIGIFHGIJHGJGJIFBGHDBG9BDAFIIDHIIGCHCHI<GACC at ADHHHE;7?@DEFED>@;ACCC>ABB;AAD<BC> 
77    *    0    0    *    *    0    0 
CATGGATGAGGAGAATGAGGATTTTGCGCCGGCTGCTCAGAAGATACCGTGAATCTAAGAAGATCGATCGCCACATGTATCACAGCCTGTACCTGAAGGGG 
@@@DD?BADHF<D<ACG>FFE;BBF at B?@C at F:(?1.=)))883)8=7@(65??EEBDEC37;;>???=BB@<BBCCACBDDCC:?BCBC:@#########

Does anybody has an idea of what is wrong in this line?
There is any way to validate the sam file before running asBam to detect 
and filtered out lines that might create problems in the conversion into 
Bam?
Cheers
Raf

########
sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets methods
[8] base

other attached packages:
[1] Rsamtools_1.13.16     Biostrings_2.29.3 GenomicRanges_1.13.15
[4] XVector_0.1.0         IRanges_1.19.8        BiocGenerics_0.7.2

loaded via a namespace (and not attached):
[1] bitops_1.0-5   stats4_3.0.0   zlibbioc_1.7.0

-- 

----------------------------------------
Prof. Raffaele A. Calogero
Bioinformatics and Genomics Unit
MBC Centro di Biotecnologie Molecolari
Via Nizza 52, Torino 10126
tel.   ++39 0116706457
Fax    ++39 0112366457
Mobile ++39 3333827080
email: raffaele.calogero at unito.it
        raffaele[dot]calogero[at]gmail[dot]com
www:   http://www.bioinformatica.unito.it



More information about the Bioc-devel mailing list