[Bioc-devel] VariantAnnotation::isDelins() ??

Robert Castelo robert.castelo at upf.edu
Tue Feb 10 15:29:46 CET 2015


hi,

in the VariantAnnotation package, the help of the functions for 
identifying variant types such as SNVs, insertions,
deletions, transitions, and structural rearrangements gives the 
following definitions:


         • isSNV: Reference and alternate alleles are both a single
           nucleotide long.

         • isInsertion: Reference allele is a single nucleotide and the
           alternate allele is greater (longer) than a single nucleotide
           and the first nucleotide of the alternate allele matches the
           reference.

         • isDeletion: Alternate allele is a single nucleotide and the
           reference allele is greater (longer) than a single nucleotide
           and the first nucleotide of the reference allele matches the
           alternate.

         • isIndel: The variant is either a deletion or insertion as
           determined by ‘isDeletion’ and ‘isInsertion’.

         • isSubstition: Reference and alternate alleles are the same
           length (1 or more nucleotides long).

         • isTransition: Reference and alternate alleles are both a
           single nucleotide long.  The reference-alternate pair
           interchange is of either two-ring purines (A <-> G) or
           one-ring pyrimidines (C <-> T).


however, unless I'm missing something here, these definitions do not 
cover the indels that involve the the insertion or deletion involving 
more than one, respectively, reference or alternate nucleotide. this 
could be an example of what i'm trying to say:

library(VariantAnnotation)

vr <- VRanges(seqnames = rep("chr1", times=5),
               ranges = IRanges(seq(1, 10, by=20),
                                seq(1, 10, by=20)+c(1, 1, 2, 2, 3)),
               ref = c("T", "A",  "A", "AC",  "AC"),
               alt = c("C", "T", "AC", "AT", "ACC"),
               refDepth = c(5, 10, 5, 10, 5),
               altDepth = c(7, 6, 7, 6, 7),
               totalDepth = c(12, 17, 12, 17, 12),
               sampleNames = letters[1:5])

isSNV(vr)
## [1]  TRUE  TRUE FALSE FALSE FALSE
isIndel(vr)
## [1] FALSE FALSE  TRUE FALSE FALSE
isSubstitution(vr)
## [1]  TRUE  TRUE FALSE  TRUE FALSE

note that the last variant does not evaluate as true for any of the 
three possibilities. after looking for variant definitions, i have found 
that the Human Genome Variation Society (HGVS) describes this as a 
deletion followed by an insertion and calls it "indel" or delins" (it's 
unclear to me whether they use that interchangeably), see the link here:

http://www.hgvs.org/mutnomen/recs-DNA.html#indel

the only other site I could quickly find with Google, where some 
specific definition is given is the site of the software SnpEff, which 
calls it "MIXED", a "Multiple-nucleotide and an InDel":

http://snpeff.sourceforge.net/SnpEff_manual.html

I would suggest that VariantAnnotation should try to identify this type 
of variant. following the HGVS recommendations, could we maybe have a 
function for it called isDelins() ??



cheers,

robert.



More information about the Bioc-devel mailing list