[BioC] How to access custom SAM tags (Rsamtools?)
Martin Morgan
mtmorgan at fhcrc.org
Thu Nov 29 15:32:48 CET 2012
On 11/29/2012 06:23 AM, Kemal Akman wrote:
> Hello,
>
> I'm interested in accessing extended SAM tags in aligned short read
> sequence files using a R/Bioconductor library. Rsamtools seems
> potentially suited for this purpose, but I couldn't find the arguments
> to return custom SAM tags, such as "XX:Z", but there doesn't seem to
> be a corresponding "what" value in ScanBamParam()?
Hi,
There is a 'tag' argument to ScanBamParam; it's illustrated on the help page
?ScanBamParam and in
example(ScanBamParam)
which leads in part to...
ScnBmP> ## tags; NM: edit distance; H1: 1-difference hits
ScnBmP> p4 <- ScanBamParam(tag=c("NM", "H1"), what="flag")
ScnBmP> bam4 <- scanBam(fl, param=p4)
ScnBmP> str(bam4[[1]][["tag"]])
List of 2
$ NM: int [1:3307] 0 0 0 5 0 1 0 1 0 1 ...
$ H1: int [1:3307] 0 0 0 0 0 1 0 1 0 0 ...
It's often convenient to readBamGappedAlignments with extra fields, so
head(readBamGappedAlignments(fl, param=p4))
gives
> head(readBamGappedAlignments(fl, param=p4))
GappedAlignments with 6 alignments and 3 metadata columns:
seqnames strand cigar qwidth start end width
<Rle> <Rle> <character> <integer> <integer> <integer> <integer>
[1] seq1 + 36M 36 1 36 36
[2] seq1 + 35M 35 3 37 35
[3] seq1 + 35M 35 5 39 35
[4] seq1 + 36M 36 6 41 36
[5] seq1 + 35M 35 9 43 35
[6] seq1 + 35M 35 13 47 35
ngap | flag NM H1
<integer> | <integer> <integer> <integer>
[1] 0 | 73 0 0
[2] 0 | 73 0 0
[3] 0 | 137 0 0
[4] 0 | 137 5 0
[5] 0 | 137 0 0
[6] 0 | 73 1 1
---
seqlengths:
seq1 seq2
1575 1584
Martin
>
> I'd be especially interested in such a feature to parse methylation
> strings from Bismark, as well as custom SAM tags from other tools.
>
> Any suggestions on Rsamtools or alternative methods to achieve this in
> Bioconductor would be much appreciated.
>
> Example SAM data:
> $ head -2 sample.sam
> SRR306424.2547_PRESLEY:4:4:62:558_length=76 16 chr22
> 30675421 255 36M * 0 0
> CACACACATCCACATAACACCATAACCAACCCCCGA
> ;,?A9:>B at B?@BBAC at C/BBC at CBCCC8BCBACBB NM:i:4 XX:Z:15GG6G11G
> XM:Z:...............hh......h..........Zx XR:Z:CT XG:Z:GA
> SRR306424.5227_PRESLEY:4:4:113:1768_length=76 0 chr5
> 123101409 255 75M * 0 0
> TGATTTTTATATAAGGTGTAAGTAAGAGGTTTAGTTTTGATTTTTTGTATATGGATAATTAGTTTTTTTAGTATT
> B>?@BBABBCBCCBCB>B at B@B?B at B@BB>BB at B?BBBB at BCBBABBAABABB@@B??ABAA>BABBBA=><@A=
> NM:i:13 XX:Z:22C8C12C4C8CC4C1CCC2C1CC
> XM:Z:......................h........x............x....h........hx....h.hhx..h.hh
> XR:Z:CT XG:Z:CT
>
>
> Best regards,
>
> Kemal Akman
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list