[Bioc-devel] contradictions in the return value of predictCoding() from VariantAnnotation
Valerie Obenchain
vobencha at fhcrc.org
Sun Mar 10 02:45:42 CET 2013
Hi Michael,
Thanks for reporting the bug and sending a reproducible example. Sorry
it took a few days to get to this.
A fix has been checked into versions 1.4.12 (release) and 1.5.44 (devel).
Valerie
On 03/07/13 05:43, Michael Stadler wrote:
> Here is the second attachment ("variants.vcf") that was missing from the
> first message.
>
> Michael
>
> On 03/07/2013 02:40 PM, Michael Stadler wrote:
>> Dear Valerie and colleagues,
>>
>> I have recently started using the VariantAnnotation package, which is a
>> huge asset for much of my work with sequence variants - thank you!
>>
>> I have a question regarding the following example (it makes use of two
>> small text files, "annot.gtf" and "variants.vcf", attached to this message):
>>
>> library(VariantAnnotation)
>> library(GenomicFeatures)
>> library(BSgenome.Celegans.UCSC.ce6)
>>
>> # build a TranscriptDb from "annot.gtf"
>> chrominfo <- data.frame(chrom=as.character(seqnames(Celegans)),
>> length=seqlengths(Celegans),
>> is_circular=FALSE)
>> txdb <- makeTranscriptDbFromGFF(file="annot.gtf",
>> format="gtf",
>> chrominfo=chrominfo,
>> dataSource="WormBase v. WS190",
>> species="Caenorhabditis elegans")
>> txdb
>> #TranscriptDb object:
>> #| Db type: TranscriptDb
>> #| Supporting package: GenomicFeatures
>> #| Data source: WormBase v. WS190
>> #| Organism: Caenorhabditis elegans
>> #| miRBase build ID: NA
>> #| transcript_nrow: 7
>> #| exon_nrow: 42
>> #| cds_nrow: 42
>> #| Db created by: GenomicFeatures package from Bioconductor
>> #| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013)
>> #| GenomicFeatures version at creation time: 1.11.14
>> #| RSQLite version at creation time: 0.11.2
>> #| DBSCHEMAVERSION: 1.0
>>
>> # read in sequence variants
>> vcf <- readVcf("variants.vcf", genome="ce6")
>> dim(vcf)
>> #[1] 3 1
>>
>> # predict the impact on coding sequences
>> aa <- predictCoding(query=vcf, subject=txdb, seqSource=Celegans)
>> length(aa)
>> #[1] 7
>>
>>
>> # My question is related to the impact of variant "chrV:15822727":
>> # This SNV is overlapping two transcripts (same base, plus strand),
>> # yet the reported VARCODON values are different (GAT -> TAT/AAT):
>> aa[names(aa)=="chrV:15822727",c("REF","ALT","varAllele","REFCODON","VARCODON")]
>>
>> #GRanges with 2 ranges and 5 metadata columns:
>> # seqnames ranges strand |
>> # <Rle> <IRanges> <Rle> |
>> # chrV:15822727 chrV [15822727, 15822727] + |
>> # chrV:15822727 chrV [15822727, 15822727] + |
>> # REF ALT varAllele
>> # <DNAStringSet> <DNAStringSetList> <DNAStringSet>
>> # chrV:15822727 G A A
>> # chrV:15822727 G A A
>> # REFCODON VARCODON
>> # <DNAStringSet> <DNAStringSet>
>> # chrV:15822727 GAT TAT
>> # chrV:15822727 GAT AAT
>> # ---
>> # seqlengths:
>> # chrI chrV
>> # NA NA
>>
>>
>> # interestingly, when altering the annotation or the set of variants,
>> # this contradictions disapears (GAT -> AAT):
>> vcf2 <- vcf[-1] # remove the first SNV
>> aa2 <- predictCoding(query=vcf2, subject=txdb, seqSource=Celegans)
>> length(aa2)
>> #[1] 5
>> aa2[names(aa2)=="chrV:15822727",c("REF","ALT","varAllele","REFCODON","VARCODON")]
>>
>> #GRanges with 2 ranges and 5 metadata columns:
>> # seqnames ranges strand |
>> # <Rle> <IRanges> <Rle> |
>> # chrV:15822727 chrV [15822727, 15822727] + |
>> # chrV:15822727 chrV [15822727, 15822727] + |
>> # REF ALT varAllele
>> # <DNAStringSet> <DNAStringSetList> <DNAStringSet>
>> # chrV:15822727 G A A
>> # chrV:15822727 G A A
>> # REFCODON VARCODON
>> # <DNAStringSet> <DNAStringSet>
>> # chrV:15822727 GAT AAT
>> # chrV:15822727 GAT AAT
>> # ---
>> # seqlengths:
>> # chrI chrV
>> # NA NA
>>
>> Do you know why this could be the case?
>> Regards,
>> Michael
>>
>>
>> Here is my envirnoment:
>> sessionInfo()
>> R Under development (unstable) (2013-02-25 r62061)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets
>> [7] methods base
>>
>> other attached packages:
>> [1] BSgenome.Celegans.UCSC.ce6_1.3.19
>> [2] BSgenome_1.27.1
>> [3] GenomicFeatures_1.11.14
>> [4] AnnotationDbi_1.21.12
>> [5] Biobase_2.19.3
>> [6] VariantAnnotation_1.5.42
>> [7] Rsamtools_1.11.21
>> [8] Biostrings_2.27.11
>> [9] GenomicRanges_1.11.35
>> [10] IRanges_1.17.35
>> [11] BiocGenerics_0.5.6
>> [12] RColorBrewer_1.0-5
>>
>> loaded via a namespace (and not attached):
>> [1] DBI_0.2-5 RCurl_1.95-4.1 RSQLite_0.11.2
>> [4] XML_3.95-0.1 biomaRt_2.15.0 bitops_1.0-5
>> [7] rtracklayer_1.19.9 stats4_3.0.0 tools_3.0.0
>> [10] zlibbioc_1.5.0
>>
>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
More information about the Bioc-devel
mailing list