[Bioc-devel] VariantAnnotation::isDelins() ??

Valerie Obenchain vobencha at fredhutch.org
Tue Feb 10 19:37:34 CET 2015


Hi Robert,

This sounds like a good addition. I'll put it on the TODO. If you need 
this immediately I'd be happy to accept a patch (with unit tests).

Valerie



On 02/10/2015 06:29 AM, Robert Castelo wrote:
> hi,
>
> in the VariantAnnotation package, the help of the functions for
> identifying variant types such as SNVs, insertions,
> deletions, transitions, and structural rearrangements gives the
> following definitions:
>
>
>          • isSNV: Reference and alternate alleles are both a single
>            nucleotide long.
>
>          • isInsertion: Reference allele is a single nucleotide and the
>            alternate allele is greater (longer) than a single nucleotide
>            and the first nucleotide of the alternate allele matches the
>            reference.
>
>          • isDeletion: Alternate allele is a single nucleotide and the
>            reference allele is greater (longer) than a single nucleotide
>            and the first nucleotide of the reference allele matches the
>            alternate.
>
>          • isIndel: The variant is either a deletion or insertion as
>            determined by ‘isDeletion’ and ‘isInsertion’.
>
>          • isSubstition: Reference and alternate alleles are the same
>            length (1 or more nucleotides long).
>
>          • isTransition: Reference and alternate alleles are both a
>            single nucleotide long.  The reference-alternate pair
>            interchange is of either two-ring purines (A <-> G) or
>            one-ring pyrimidines (C <-> T).
>
>
> however, unless I'm missing something here, these definitions do not
> cover the indels that involve the the insertion or deletion involving
> more than one, respectively, reference or alternate nucleotide. this
> could be an example of what i'm trying to say:
>
> library(VariantAnnotation)
>
> vr <- VRanges(seqnames = rep("chr1", times=5),
>                ranges = IRanges(seq(1, 10, by=20),
>                                 seq(1, 10, by=20)+c(1, 1, 2, 2, 3)),
>                ref = c("T", "A",  "A", "AC",  "AC"),
>                alt = c("C", "T", "AC", "AT", "ACC"),
>                refDepth = c(5, 10, 5, 10, 5),
>                altDepth = c(7, 6, 7, 6, 7),
>                totalDepth = c(12, 17, 12, 17, 12),
>                sampleNames = letters[1:5])
>
> isSNV(vr)
> ## [1]  TRUE  TRUE FALSE FALSE FALSE
> isIndel(vr)
> ## [1] FALSE FALSE  TRUE FALSE FALSE
> isSubstitution(vr)
> ## [1]  TRUE  TRUE FALSE  TRUE FALSE
>
> note that the last variant does not evaluate as true for any of the
> three possibilities. after looking for variant definitions, i have found
> that the Human Genome Variation Society (HGVS) describes this as a
> deletion followed by an insertion and calls it "indel" or delins" (it's
> unclear to me whether they use that interchangeably), see the link here:
>
> http://www.hgvs.org/mutnomen/recs-DNA.html#indel
>
> the only other site I could quickly find with Google, where some
> specific definition is given is the site of the software SnpEff, which
> calls it "MIXED", a "Multiple-nucleotide and an InDel":
>
> http://snpeff.sourceforge.net/SnpEff_manual.html
>
> I would suggest that VariantAnnotation should try to identify this type
> of variant. following the HGVS recommendations, could we maybe have a
> function for it called isDelins() ??
>
>
>
> cheers,
>
> robert.
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list