[BioC] Rsamtools: readGAlignmentsFromBam weird behavior reading MAPQ and RNEXT fields
Hervé Pagès
hpages at fhcrc.org
Tue Oct 15 01:51:38 CEST 2013
Hi Rubi,
On 10/14/2013 04:41 PM, rubi [guest] wrote:
>
> Hi,
>
> I'm using readGAlignmentsFromBam where I defined the bam parameters to be scanned as follows:
> class: ScanBamParam
> bamFlag (NA unless specified):
> bamSimpleCigar: FALSE
> bamReverseComplement: FALSE
> bamTag: NH, HI, AS, nM
> bamWhich: 0 elements
> bamWhat: flag, mapq, mrnm, mpos, seq, qual
>
> Using an example of a single alignment record I get the following:
> ILLUMINA:223:D0CUVACXX:5:1104:11885:187863 0 chr11 96806772 255 44M * 0 0 GGCTGACCTGGTCCCTGCCAAGGAAGCCAATGTCAAGTGCCCAC JJJIJJJJJJJJJJJIJIJIJJJHJIJJJJJIJJJJJGHIIIJJ NH:i:1 HI:i:1 AS:i:43 nM:i:0
>
> seqnames strand cigar qwidth start end width ngap | flag mapq mrnm mpos
> <Rle> <Rle> <character> <integer> <integer> <integer> <integer> <integer> | <integer> <integer> <factor> <integer>
> ILLUMINA:223:D0CUVACXX:5:1104:11885:187863 chr11 + 44M 44 96806772 96806815 44 0 | 0 <NA> <NA> 0
>
> seq qual NH HI AS nM
> <DNAStringSet> <PhredQuality> <integer> <integer> <integer> <integer>
> ILLUMINA:223:D0CUVACXX:5:1104:11885:187863 GGCTGACCTGGTCCCTGCCAAGGAAGCCAATGTCAAGTGCCCAC JJJIJJJJJJJJJJJIJIJIJJJHJIJJJJJIJJJJJGHIIIJJ 1 1 43 0
>
>
> As you can see the MAPQ field in the input is 255 but is NA in the gapped alignment object, although mapq is defined in the ScanBamParam. The RNEXT field is also NA but I guess this is the representation of its "*" value in the bam record.
>
> When I change the MAPQ in the input to be different from 255 (e.g., 3) this doesn't happen.
This is the intended behavior. Like "*" means the RNEXT is not available,
255 means that MAPQ is non-available.
From the SAM Spec (http://samtools.sourceforge.net/SAMv1.pdf):
5. MAPQ: MAPping Quality. It equals -10*log10(Pr{mapping position is
wrong}),
rounded to the nearest integer. A value 255 indicates that the
mapping quality is not available.
Cheers,
H.
>
> -- output of sessionInfo():
>
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] data.table_1.8.10 Rsamtools_1.13.44 Biostrings_2.29.19 GenomicRanges_1.13.45 XVector_0.1.4 IRanges_1.19.38 BiocGenerics_0.7.5
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-6 stats4_3.0.2 tools_3.0.2 zlibbioc_1.7.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list