[Bioc-devel] contradictions in the return value of predictCoding() from VariantAnnotation
Michael Stadler
michael.stadler at fmi.ch
Thu Mar 7 14:40:01 CET 2013
Dear Valerie and colleagues,
I have recently started using the VariantAnnotation package, which is a
huge asset for much of my work with sequence variants - thank you!
I have a question regarding the following example (it makes use of two
small text files, "annot.gtf" and "variants.vcf", attached to this message):
library(VariantAnnotation)
library(GenomicFeatures)
library(BSgenome.Celegans.UCSC.ce6)
# build a TranscriptDb from "annot.gtf"
chrominfo <- data.frame(chrom=as.character(seqnames(Celegans)),
length=seqlengths(Celegans),
is_circular=FALSE)
txdb <- makeTranscriptDbFromGFF(file="annot.gtf",
format="gtf",
chrominfo=chrominfo,
dataSource="WormBase v. WS190",
species="Caenorhabditis elegans")
txdb
#TranscriptDb object:
#| Db type: TranscriptDb
#| Supporting package: GenomicFeatures
#| Data source: WormBase v. WS190
#| Organism: Caenorhabditis elegans
#| miRBase build ID: NA
#| transcript_nrow: 7
#| exon_nrow: 42
#| cds_nrow: 42
#| Db created by: GenomicFeatures package from Bioconductor
#| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013)
#| GenomicFeatures version at creation time: 1.11.14
#| RSQLite version at creation time: 0.11.2
#| DBSCHEMAVERSION: 1.0
# read in sequence variants
vcf <- readVcf("variants.vcf", genome="ce6")
dim(vcf)
#[1] 3 1
# predict the impact on coding sequences
aa <- predictCoding(query=vcf, subject=txdb, seqSource=Celegans)
length(aa)
#[1] 7
# My question is related to the impact of variant "chrV:15822727":
# This SNV is overlapping two transcripts (same base, plus strand),
# yet the reported VARCODON values are different (GAT -> TAT/AAT):
aa[names(aa)=="chrV:15822727",c("REF","ALT","varAllele","REFCODON","VARCODON")]
#GRanges with 2 ranges and 5 metadata columns:
# seqnames ranges strand |
# <Rle> <IRanges> <Rle> |
# chrV:15822727 chrV [15822727, 15822727] + |
# chrV:15822727 chrV [15822727, 15822727] + |
# REF ALT varAllele
# <DNAStringSet> <DNAStringSetList> <DNAStringSet>
# chrV:15822727 G A A
# chrV:15822727 G A A
# REFCODON VARCODON
# <DNAStringSet> <DNAStringSet>
# chrV:15822727 GAT TAT
# chrV:15822727 GAT AAT
# ---
# seqlengths:
# chrI chrV
# NA NA
# interestingly, when altering the annotation or the set of variants,
# this contradictions disapears (GAT -> AAT):
vcf2 <- vcf[-1] # remove the first SNV
aa2 <- predictCoding(query=vcf2, subject=txdb, seqSource=Celegans)
length(aa2)
#[1] 5
aa2[names(aa2)=="chrV:15822727",c("REF","ALT","varAllele","REFCODON","VARCODON")]
#GRanges with 2 ranges and 5 metadata columns:
# seqnames ranges strand |
# <Rle> <IRanges> <Rle> |
# chrV:15822727 chrV [15822727, 15822727] + |
# chrV:15822727 chrV [15822727, 15822727] + |
# REF ALT varAllele
# <DNAStringSet> <DNAStringSetList> <DNAStringSet>
# chrV:15822727 G A A
# chrV:15822727 G A A
# REFCODON VARCODON
# <DNAStringSet> <DNAStringSet>
# chrV:15822727 GAT AAT
# chrV:15822727 GAT AAT
# ---
# seqlengths:
# chrI chrV
# NA NA
Do you know why this could be the case?
Regards,
Michael
Here is my envirnoment:
sessionInfo()
R Under development (unstable) (2013-02-25 r62061)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
[7] methods base
other attached packages:
[1] BSgenome.Celegans.UCSC.ce6_1.3.19
[2] BSgenome_1.27.1
[3] GenomicFeatures_1.11.14
[4] AnnotationDbi_1.21.12
[5] Biobase_2.19.3
[6] VariantAnnotation_1.5.42
[7] Rsamtools_1.11.21
[8] Biostrings_2.27.11
[9] GenomicRanges_1.11.35
[10] IRanges_1.17.35
[11] BiocGenerics_0.5.6
[12] RColorBrewer_1.0-5
loaded via a namespace (and not attached):
[1] DBI_0.2-5 RCurl_1.95-4.1 RSQLite_0.11.2
[4] XML_3.95-0.1 biomaRt_2.15.0 bitops_1.0-5
[7] rtracklayer_1.19.9 stats4_3.0.0 tools_3.0.0
[10] zlibbioc_1.5.0
-------------- next part --------------
chrV Coding_transcript exon 6512065 6512138 . - 2 gene_id "WBGene00001073"; transcript_id "F46E10.9.1";
chrV Coding_transcript exon 6512189 6512279 . - 0 gene_id "WBGene00001073"; transcript_id "F46E10.9.1";
chrV Coding_transcript exon 6512325 6512472 . - 1 gene_id "WBGene00001073"; transcript_id "F46E10.9.1";
chrV Coding_transcript exon 6512521 6512649 . - 1 gene_id "WBGene00001073"; transcript_id "F46E10.9.1";
chrV Coding_transcript exon 6512704 6512865 . - 1 gene_id "WBGene00001073"; transcript_id "F46E10.9.1";
chrV Coding_transcript exon 6513192 6513328 . - 0 gene_id "WBGene00001073"; transcript_id "F46E10.9.1";
chrV Coding_transcript exon 6512065 6512138 . - 2 gene_id "WBGene00001073"; transcript_id "F46E10.9.2";
chrV Coding_transcript exon 6512189 6512279 . - 0 gene_id "WBGene00001073"; transcript_id "F46E10.9.2";
chrV Coding_transcript exon 6512325 6512472 . - 1 gene_id "WBGene00001073"; transcript_id "F46E10.9.2";
chrV Coding_transcript exon 6512521 6512649 . - 1 gene_id "WBGene00001073"; transcript_id "F46E10.9.2";
chrV Coding_transcript exon 6512704 6512865 . - 1 gene_id "WBGene00001073"; transcript_id "F46E10.9.2";
chrV Coding_transcript exon 6513192 6513328 . - 0 gene_id "WBGene00001073"; transcript_id "F46E10.9.2";
chrV Coding_transcript exon 6512065 6512138 . - 2 gene_id "WBGene00001073"; transcript_id "F46E10.9.3";
chrV Coding_transcript exon 6512189 6512279 . - 0 gene_id "WBGene00001073"; transcript_id "F46E10.9.3";
chrV Coding_transcript exon 6512325 6512472 . - 1 gene_id "WBGene00001073"; transcript_id "F46E10.9.3";
chrV Coding_transcript exon 6512521 6512649 . - 1 gene_id "WBGene00001073"; transcript_id "F46E10.9.3";
chrV Coding_transcript exon 6512704 6512865 . - 1 gene_id "WBGene00001073"; transcript_id "F46E10.9.3";
chrV Coding_transcript exon 6513192 6513328 . - 0 gene_id "WBGene00001073"; transcript_id "F46E10.9.3";
chrV Coding_transcript exon 15821672 15821846 . + 0 gene_id "WBGene00010021"; transcript_id "F54B8.7a";
chrV Coding_transcript exon 15822137 15822594 . + 2 gene_id "WBGene00010021"; transcript_id "F54B8.7a";
chrV Coding_transcript exon 15822640 15822858 . + 0 gene_id "WBGene00010021"; transcript_id "F54B8.7a";
chrV Coding_transcript exon 15821672 15821846 . + 0 gene_id "WBGene00010021"; transcript_id "F54B8.7b";
chrV Coding_transcript exon 15822368 15822594 . + 2 gene_id "WBGene00010021"; transcript_id "F54B8.7b";
chrV Coding_transcript exon 15822640 15822858 . + 0 gene_id "WBGene00010021"; transcript_id "F54B8.7b";
chrI Coding_transcript exon 11076365 11076616 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript exon 11077471 11077953 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript exon 11079045 11079446 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript exon 11080110 11080791 . - 1 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript exon 11081578 11082098 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript exon 11083582 11084058 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript exon 11086719 11086956 . - 1 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript exon 11087686 11087845 . - 2 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript exon 11088405 11088538 . - 1 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript exon 11093792 11094114 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript exon 11076365 11076616 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11077471 11077953 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11079045 11079446 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11080110 11080791 . - 1 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11081578 11082098 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11083582 11084058 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11086719 11086956 . - 1 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11087686 11087845 . - 2 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11088405 11088538 . - 1 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11094096 11094325 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11112636 11112830 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11113746 11113892 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11113938 11114095 . - 2 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11114144 11114228 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11114675 11115180 . - 2 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11115251 11115394 . - 2 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11115442 11115695 . - 1 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11117111 11117250 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11117302 11118006 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11118866 11119116 . - 2 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11119166 11119451 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11119994 11120316 . - 2 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11121218 11121564 . - 1 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11122440 11122680 . - 2 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11122879 11123070 . - 2 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11127799 11127958 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11128420 11128714 . - 1 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11128811 11128944 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11129315 11129731 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11130175 11130324 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript exon 11134499 11134552 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrV Coding_transcript CDS 6512065 6512138 . - 2 gene_id "WBGene00001073"; transcript_id "F46E10.9.3";
chrV Coding_transcript CDS 6512065 6512138 . - 2 gene_id "WBGene00001073"; transcript_id "F46E10.9.2";
chrV Coding_transcript CDS 6512065 6512138 . - 2 gene_id "WBGene00001073"; transcript_id "F46E10.9.1";
chrV Coding_transcript CDS 6512189 6512279 . - 0 gene_id "WBGene00001073"; transcript_id "F46E10.9.3";
chrV Coding_transcript CDS 6512189 6512279 . - 0 gene_id "WBGene00001073"; transcript_id "F46E10.9.2";
chrV Coding_transcript CDS 6512189 6512279 . - 0 gene_id "WBGene00001073"; transcript_id "F46E10.9.1";
chrV Coding_transcript CDS 6512325 6512472 . - 1 gene_id "WBGene00001073"; transcript_id "F46E10.9.3";
chrV Coding_transcript CDS 6512325 6512472 . - 1 gene_id "WBGene00001073"; transcript_id "F46E10.9.2";
chrV Coding_transcript CDS 6512325 6512472 . - 1 gene_id "WBGene00001073"; transcript_id "F46E10.9.1";
chrV Coding_transcript CDS 6512521 6512649 . - 1 gene_id "WBGene00001073"; transcript_id "F46E10.9.3";
chrV Coding_transcript CDS 6512521 6512649 . - 1 gene_id "WBGene00001073"; transcript_id "F46E10.9.2";
chrV Coding_transcript CDS 6512521 6512649 . - 1 gene_id "WBGene00001073"; transcript_id "F46E10.9.1";
chrV Coding_transcript CDS 6512704 6512865 . - 1 gene_id "WBGene00001073"; transcript_id "F46E10.9.3";
chrV Coding_transcript CDS 6512704 6512865 . - 1 gene_id "WBGene00001073"; transcript_id "F46E10.9.2";
chrV Coding_transcript CDS 6512704 6512865 . - 1 gene_id "WBGene00001073"; transcript_id "F46E10.9.1";
chrV Coding_transcript CDS 6513192 6513328 . - 0 gene_id "WBGene00001073"; transcript_id "F46E10.9.3";
chrV Coding_transcript CDS 6513192 6513328 . - 0 gene_id "WBGene00001073"; transcript_id "F46E10.9.2";
chrV Coding_transcript CDS 6513192 6513328 . - 0 gene_id "WBGene00001073"; transcript_id "F46E10.9.1";
chrV Coding_transcript CDS 15821672 15821846 . + 0 gene_id "WBGene00010021"; transcript_id "F54B8.7a";
chrV Coding_transcript CDS 15822137 15822594 . + 2 gene_id "WBGene00010021"; transcript_id "F54B8.7a";
chrV Coding_transcript CDS 15822640 15822858 . + 0 gene_id "WBGene00010021"; transcript_id "F54B8.7a";
chrV Coding_transcript CDS 15821672 15821846 . + 0 gene_id "WBGene00010021"; transcript_id "F54B8.7b";
chrV Coding_transcript CDS 15822368 15822594 . + 2 gene_id "WBGene00010021"; transcript_id "F54B8.7b";
chrV Coding_transcript CDS 15822640 15822858 . + 0 gene_id "WBGene00010021"; transcript_id "F54B8.7b";
chrI Coding_transcript CDS 11076365 11076616 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript CDS 11077471 11077953 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript CDS 11079045 11079446 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript CDS 11080110 11080791 . - 1 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript CDS 11081578 11082098 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript CDS 11083582 11084058 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript CDS 11086719 11086956 . - 1 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript CDS 11087686 11087845 . - 2 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript CDS 11088405 11088538 . - 1 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript CDS 11093792 11094114 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1a";
chrI Coding_transcript CDS 11076365 11076616 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11077471 11077953 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11079045 11079446 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11080110 11080791 . - 1 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11081578 11082098 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11083582 11084058 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11086719 11086956 . - 1 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11087686 11087845 . - 2 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11088405 11088538 . - 1 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11094096 11094325 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11112636 11112830 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11113746 11113892 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11113938 11114095 . - 2 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11114144 11114228 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11114675 11115180 . - 2 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11115251 11115394 . - 2 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11115442 11115695 . - 1 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11117111 11117250 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11117302 11118006 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11118866 11119116 . - 2 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11119166 11119451 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11119994 11120316 . - 2 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11121218 11121564 . - 1 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11122440 11122680 . - 2 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11122879 11123070 . - 2 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11127799 11127958 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11128420 11128714 . - 1 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11128811 11128944 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11129315 11129731 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11130175 11130324 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
chrI Coding_transcript CDS 11134499 11134552 . - 0 gene_id "WBGene00001980"; transcript_id "W02B9.1b";
More information about the Bioc-devel
mailing list