[Bioc-devel] VariantAnnotation::isDelins() ??
Valerie Obenchain
vobencha at fredhutch.org
Tue Feb 10 19:37:34 CET 2015
Hi Robert,
This sounds like a good addition. I'll put it on the TODO. If you need
this immediately I'd be happy to accept a patch (with unit tests).
Valerie
On 02/10/2015 06:29 AM, Robert Castelo wrote:
> hi,
>
> in the VariantAnnotation package, the help of the functions for
> identifying variant types such as SNVs, insertions,
> deletions, transitions, and structural rearrangements gives the
> following definitions:
>
>
> • isSNV: Reference and alternate alleles are both a single
> nucleotide long.
>
> • isInsertion: Reference allele is a single nucleotide and the
> alternate allele is greater (longer) than a single nucleotide
> and the first nucleotide of the alternate allele matches the
> reference.
>
> • isDeletion: Alternate allele is a single nucleotide and the
> reference allele is greater (longer) than a single nucleotide
> and the first nucleotide of the reference allele matches the
> alternate.
>
> • isIndel: The variant is either a deletion or insertion as
> determined by ‘isDeletion’ and ‘isInsertion’.
>
> • isSubstition: Reference and alternate alleles are the same
> length (1 or more nucleotides long).
>
> • isTransition: Reference and alternate alleles are both a
> single nucleotide long. The reference-alternate pair
> interchange is of either two-ring purines (A <-> G) or
> one-ring pyrimidines (C <-> T).
>
>
> however, unless I'm missing something here, these definitions do not
> cover the indels that involve the the insertion or deletion involving
> more than one, respectively, reference or alternate nucleotide. this
> could be an example of what i'm trying to say:
>
> library(VariantAnnotation)
>
> vr <- VRanges(seqnames = rep("chr1", times=5),
> ranges = IRanges(seq(1, 10, by=20),
> seq(1, 10, by=20)+c(1, 1, 2, 2, 3)),
> ref = c("T", "A", "A", "AC", "AC"),
> alt = c("C", "T", "AC", "AT", "ACC"),
> refDepth = c(5, 10, 5, 10, 5),
> altDepth = c(7, 6, 7, 6, 7),
> totalDepth = c(12, 17, 12, 17, 12),
> sampleNames = letters[1:5])
>
> isSNV(vr)
> ## [1] TRUE TRUE FALSE FALSE FALSE
> isIndel(vr)
> ## [1] FALSE FALSE TRUE FALSE FALSE
> isSubstitution(vr)
> ## [1] TRUE TRUE FALSE TRUE FALSE
>
> note that the last variant does not evaluate as true for any of the
> three possibilities. after looking for variant definitions, i have found
> that the Human Genome Variation Society (HGVS) describes this as a
> deletion followed by an insertion and calls it "indel" or delins" (it's
> unclear to me whether they use that interchangeably), see the link here:
>
> http://www.hgvs.org/mutnomen/recs-DNA.html#indel
>
> the only other site I could quickly find with Google, where some
> specific definition is given is the site of the software SnpEff, which
> calls it "MIXED", a "Multiple-nucleotide and an InDel":
>
> http://snpeff.sourceforge.net/SnpEff_manual.html
>
> I would suggest that VariantAnnotation should try to identify this type
> of variant. following the HGVS recommendations, could we maybe have a
> function for it called isDelins() ??
>
>
>
> cheers,
>
> robert.
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list