[Bioc-devel] Updating Rsamtools to support BAMs with >65535 CIGAR operators

Heng Li hengli at broadinstitute.org
Wed Oct 18 22:32:54 CEST 2017


Hi,

I am not sure whether I should send the request to this mailing list in this case, but I am not sure what is the best place to ask.

Anyway, an alignment with >65535 operators can't be encoded in the current BAM format. Unfortunately, a tiny fraction of ultra-long nanopore reads will be aligned with >65535 ops, which means none of the existing BAM readers works with ultra-long reads. To address this issue, we can move long CIGAR to a tag in the file and move it back in memory when the file is read.

I can update Rsamtools to support long-cigar BAMs with the approach above. The update will keep API the same but will slightly alter ABI – struct bam1_core_t in samtools/bam.c needs to be enlarged. If you think it is ok, I can generate a patch file against the git.bioconductor.org HEAD.

What do you (or the current maintainer) think? What is the best way to send this patch?

Thanks in advance,

Heng


More information about the Bioc-devel mailing list