[Bioc-devel] VariantAnnotation::isDelins() ??
robert.castelo at upf.edu
Wed Feb 11 11:36:55 CET 2015
sure, i'm attaching the patch created from a fresh checkout of the trunk
this morning. in principle, all required bits are there and it builds
and checks without errors and warnings.
On 02/10/2015 07:37 PM, Valerie Obenchain wrote:
> Hi Robert,
> This sounds like a good addition. I'll put it on the TODO. If you need
> this immediately I'd be happy to accept a patch (with unit tests).
> On 02/10/2015 06:29 AM, Robert Castelo wrote:
>> in the VariantAnnotation package, the help of the functions for
>> identifying variant types such as SNVs, insertions,
>> deletions, transitions, and structural rearrangements gives the
>> following definitions:
>> • isSNV: Reference and alternate alleles are both a single
>> nucleotide long.
>> • isInsertion: Reference allele is a single nucleotide and the
>> alternate allele is greater (longer) than a single nucleotide
>> and the first nucleotide of the alternate allele matches the
>> • isDeletion: Alternate allele is a single nucleotide and the
>> reference allele is greater (longer) than a single nucleotide
>> and the first nucleotide of the reference allele matches the
>> • isIndel: The variant is either a deletion or insertion as
>> determined by ‘isDeletion’ and ‘isInsertion’.
>> • isSubstition: Reference and alternate alleles are the same
>> length (1 or more nucleotides long).
>> • isTransition: Reference and alternate alleles are both a
>> single nucleotide long. The reference-alternate pair
>> interchange is of either two-ring purines (A <-> G) or
>> one-ring pyrimidines (C <-> T).
>> however, unless I'm missing something here, these definitions do not
>> cover the indels that involve the the insertion or deletion involving
>> more than one, respectively, reference or alternate nucleotide. this
>> could be an example of what i'm trying to say:
>> vr <- VRanges(seqnames = rep("chr1", times=5),
>> ranges = IRanges(seq(1, 10, by=20),
>> seq(1, 10, by=20)+c(1, 1, 2, 2, 3)),
>> ref = c("T", "A", "A", "AC", "AC"),
>> alt = c("C", "T", "AC", "AT", "ACC"),
>> refDepth = c(5, 10, 5, 10, 5),
>> altDepth = c(7, 6, 7, 6, 7),
>> totalDepth = c(12, 17, 12, 17, 12),
>> sampleNames = letters[1:5])
>> ##  TRUE TRUE FALSE FALSE FALSE
>> ##  FALSE FALSE TRUE FALSE FALSE
>> ##  TRUE TRUE FALSE TRUE FALSE
>> note that the last variant does not evaluate as true for any of the
>> three possibilities. after looking for variant definitions, i have found
>> that the Human Genome Variation Society (HGVS) describes this as a
>> deletion followed by an insertion and calls it "indel" or delins" (it's
>> unclear to me whether they use that interchangeably), see the link here:
>> the only other site I could quickly find with Google, where some
>> specific definition is given is the site of the software SnpEff, which
>> calls it "MIXED", a "Multiple-nucleotide and an InDel":
>> I would suggest that VariantAnnotation should try to identify this type
>> of variant. following the HGVS recommendations, could we maybe have a
>> function for it called isDelins() ??
>> Bioc-devel at r-project.org mailing list
Robert Castelo, PhD
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
More information about the Bioc-devel