[Bioc-devel] Problem in asBam from Rsamtools

Martin Morgan mtmorgan at fhcrc.org
Sat Jun 1 18:35:20 CEST 2013


On 06/01/2013 08:04 AM, rcaloger wrote:
> Hi,
> I am using the devel version of Bioconductor as part of the development of my
> package chimera.
> Testing a new function in chimera, that uses Rsubread package, I encountered a
> problem in converting a sam file generated by Rsubread in a bam file.
> I used the function asBam from Rsamtools and I got the following error:
>
> In doTryCatch(return(expr), name, parentenv, handler) :
>    Parse error at line 14667325: sequence and quality are inconsistent
>
> I managed to run asBam if I use only the sam file till line 14667324
> Instead I get the above error if I use a sam file finishing at line 14667325
>
> The line that create the problem is the following:
>
> HWI-ST169:273:D0YW6ACXX:2:1201:4070:162856    141    *    0    0 * *    0    0
> AAAAAAGGGTTGAATTATTTTCACTTGCCCACGTAGTTTATGAATGTGGGAAATAGCTTCAAAGACAGATTAAATGATTTGCCCAAGGCCACAGAAAAGAG
> @@@FFFFFHABHHJGGBFIGIFHGIJHGJGJIFBGHDBG9BDAFIIDHIIGCHCHI<GACC at ADHHHE;7?@DEFED>@;ACCC>ABB;AAD<BC>
> 77    *    0    0    *    *    0    0
> CATGGATGAGGAGAATGAGGATTTTGCGCCGGCTGCTCAGAAGATACCGTGAATCTAAGAAGATCGATCGCCACATGTATCACAGCCTGTACCTGAAGGGG
> @@@DD?BADHF<D<ACG>FFE;BBF at B?@C at F:(?1.=)))883)8=7@(65??EEBDEC37;;>???=BB@<BBCCACBDDCC:?BCBC:@#########

This looks like two separate records have been concatenated; it's really hard to 
know whether this is Rsubread or some aspect of the file system or the way the 
file has been handled after creation by Rsubread. Picard is one commonly used 
tool for validation. Martin

>
>
> Does anybody has an idea of what is wrong in this line?
> There is any way to validate the sam file before running asBam to detect and
> filtered out lines that might create problems in the conversion into Bam?
> Cheers
> Raf
>
> ########
> sessionInfo()
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets methods
> [8] base
>
> other attached packages:
> [1] Rsamtools_1.13.16     Biostrings_2.29.3 GenomicRanges_1.13.15
> [4] XVector_0.1.0         IRanges_1.19.8        BiocGenerics_0.7.2
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-5   stats4_3.0.0   zlibbioc_1.7.0
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list