[Bioc-devel] contradictions in the return value of predictCoding() from VariantAnnotation
Michael Stadler
michael.stadler at fmi.ch
Thu Mar 7 14:43:19 CET 2013
Here is the second attachment ("variants.vcf") that was missing from the
first message.
Michael
On 03/07/2013 02:40 PM, Michael Stadler wrote:
> Dear Valerie and colleagues,
>
> I have recently started using the VariantAnnotation package, which is a
> huge asset for much of my work with sequence variants - thank you!
>
> I have a question regarding the following example (it makes use of two
> small text files, "annot.gtf" and "variants.vcf", attached to this message):
>
> library(VariantAnnotation)
> library(GenomicFeatures)
> library(BSgenome.Celegans.UCSC.ce6)
>
> # build a TranscriptDb from "annot.gtf"
> chrominfo <- data.frame(chrom=as.character(seqnames(Celegans)),
> length=seqlengths(Celegans),
> is_circular=FALSE)
> txdb <- makeTranscriptDbFromGFF(file="annot.gtf",
> format="gtf",
> chrominfo=chrominfo,
> dataSource="WormBase v. WS190",
> species="Caenorhabditis elegans")
> txdb
> #TranscriptDb object:
> #| Db type: TranscriptDb
> #| Supporting package: GenomicFeatures
> #| Data source: WormBase v. WS190
> #| Organism: Caenorhabditis elegans
> #| miRBase build ID: NA
> #| transcript_nrow: 7
> #| exon_nrow: 42
> #| cds_nrow: 42
> #| Db created by: GenomicFeatures package from Bioconductor
> #| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013)
> #| GenomicFeatures version at creation time: 1.11.14
> #| RSQLite version at creation time: 0.11.2
> #| DBSCHEMAVERSION: 1.0
>
> # read in sequence variants
> vcf <- readVcf("variants.vcf", genome="ce6")
> dim(vcf)
> #[1] 3 1
>
> # predict the impact on coding sequences
> aa <- predictCoding(query=vcf, subject=txdb, seqSource=Celegans)
> length(aa)
> #[1] 7
>
>
> # My question is related to the impact of variant "chrV:15822727":
> # This SNV is overlapping two transcripts (same base, plus strand),
> # yet the reported VARCODON values are different (GAT -> TAT/AAT):
> aa[names(aa)=="chrV:15822727",c("REF","ALT","varAllele","REFCODON","VARCODON")]
>
> #GRanges with 2 ranges and 5 metadata columns:
> # seqnames ranges strand |
> # <Rle> <IRanges> <Rle> |
> # chrV:15822727 chrV [15822727, 15822727] + |
> # chrV:15822727 chrV [15822727, 15822727] + |
> # REF ALT varAllele
> # <DNAStringSet> <DNAStringSetList> <DNAStringSet>
> # chrV:15822727 G A A
> # chrV:15822727 G A A
> # REFCODON VARCODON
> # <DNAStringSet> <DNAStringSet>
> # chrV:15822727 GAT TAT
> # chrV:15822727 GAT AAT
> # ---
> # seqlengths:
> # chrI chrV
> # NA NA
>
>
> # interestingly, when altering the annotation or the set of variants,
> # this contradictions disapears (GAT -> AAT):
> vcf2 <- vcf[-1] # remove the first SNV
> aa2 <- predictCoding(query=vcf2, subject=txdb, seqSource=Celegans)
> length(aa2)
> #[1] 5
> aa2[names(aa2)=="chrV:15822727",c("REF","ALT","varAllele","REFCODON","VARCODON")]
>
> #GRanges with 2 ranges and 5 metadata columns:
> # seqnames ranges strand |
> # <Rle> <IRanges> <Rle> |
> # chrV:15822727 chrV [15822727, 15822727] + |
> # chrV:15822727 chrV [15822727, 15822727] + |
> # REF ALT varAllele
> # <DNAStringSet> <DNAStringSetList> <DNAStringSet>
> # chrV:15822727 G A A
> # chrV:15822727 G A A
> # REFCODON VARCODON
> # <DNAStringSet> <DNAStringSet>
> # chrV:15822727 GAT AAT
> # chrV:15822727 GAT AAT
> # ---
> # seqlengths:
> # chrI chrV
> # NA NA
>
> Do you know why this could be the case?
> Regards,
> Michael
>
>
> Here is my envirnoment:
> sessionInfo()
> R Under development (unstable) (2013-02-25 r62061)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
> [7] methods base
>
> other attached packages:
> [1] BSgenome.Celegans.UCSC.ce6_1.3.19
> [2] BSgenome_1.27.1
> [3] GenomicFeatures_1.11.14
> [4] AnnotationDbi_1.21.12
> [5] Biobase_2.19.3
> [6] VariantAnnotation_1.5.42
> [7] Rsamtools_1.11.21
> [8] Biostrings_2.27.11
> [9] GenomicRanges_1.11.35
> [10] IRanges_1.17.35
> [11] BiocGenerics_0.5.6
> [12] RColorBrewer_1.0-5
>
> loaded via a namespace (and not attached):
> [1] DBI_0.2-5 RCurl_1.95-4.1 RSQLite_0.11.2
> [4] XML_3.95-0.1 biomaRt_2.15.0 bitops_1.0-5
> [7] rtracklayer_1.19.9 stats4_3.0.0 tools_3.0.0
> [10] zlibbioc_1.5.0
>
>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
--
--------------------------------------------
Michael Stadler, PhD
Head of Computational Biology
Friedrich Miescher Institute
Basel (Switzerland)
Phone : +41 61 697 6492
Fax : +41 61 697 3976
Mail : michael.stadler at fmi.ch
--------------------------------------------
More information about the Bioc-devel
mailing list