[Bioc-devel] contradictions in the return value of predictCoding() from VariantAnnotation
Michael Stadler
michael.stadler at fmi.ch
Tue Mar 12 08:34:58 CET 2013
Hi Valerie,
Thank you for the quick fix - it works like a charm!
Michael
On 03/10/2013 02:45 AM, Valerie Obenchain wrote:
> Hi Michael,
>
> Thanks for reporting the bug and sending a reproducible example. Sorry
> it took a few days to get to this.
>
> A fix has been checked into versions 1.4.12 (release) and 1.5.44 (devel).
>
> Valerie
>
>
> On 03/07/13 05:43, Michael Stadler wrote:
>> Here is the second attachment ("variants.vcf") that was missing from the
>> first message.
>>
>> Michael
>>
>> On 03/07/2013 02:40 PM, Michael Stadler wrote:
>>> Dear Valerie and colleagues,
>>>
>>> I have recently started using the VariantAnnotation package, which is a
>>> huge asset for much of my work with sequence variants - thank you!
>>>
>>> I have a question regarding the following example (it makes use of two
>>> small text files, "annot.gtf" and "variants.vcf", attached to this
>>> message):
>>>
>>> library(VariantAnnotation)
>>> library(GenomicFeatures)
>>> library(BSgenome.Celegans.UCSC.ce6)
>>>
>>> # build a TranscriptDb from "annot.gtf"
>>> chrominfo <- data.frame(chrom=as.character(seqnames(Celegans)),
>>> length=seqlengths(Celegans),
>>> is_circular=FALSE)
>>> txdb <- makeTranscriptDbFromGFF(file="annot.gtf",
>>> format="gtf",
>>> chrominfo=chrominfo,
>>> dataSource="WormBase v. WS190",
>>> species="Caenorhabditis elegans")
>>> txdb
>>> #TranscriptDb object:
>>> #| Db type: TranscriptDb
>>> #| Supporting package: GenomicFeatures
>>> #| Data source: WormBase v. WS190
>>> #| Organism: Caenorhabditis elegans
>>> #| miRBase build ID: NA
>>> #| transcript_nrow: 7
>>> #| exon_nrow: 42
>>> #| cds_nrow: 42
>>> #| Db created by: GenomicFeatures package from Bioconductor
>>> #| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013)
>>> #| GenomicFeatures version at creation time: 1.11.14
>>> #| RSQLite version at creation time: 0.11.2
>>> #| DBSCHEMAVERSION: 1.0
>>>
>>> # read in sequence variants
>>> vcf <- readVcf("variants.vcf", genome="ce6")
>>> dim(vcf)
>>> #[1] 3 1
>>>
>>> # predict the impact on coding sequences
>>> aa <- predictCoding(query=vcf, subject=txdb, seqSource=Celegans)
>>> length(aa)
>>> #[1] 7
>>>
>>>
>>> # My question is related to the impact of variant "chrV:15822727":
>>> # This SNV is overlapping two transcripts (same base, plus strand),
>>> # yet the reported VARCODON values are different (GAT -> TAT/AAT):
>>> aa[names(aa)=="chrV:15822727",c("REF","ALT","varAllele","REFCODON","VARCODON")]
>>>
>>>
>>> #GRanges with 2 ranges and 5 metadata columns:
>>> # seqnames ranges strand |
>>> # <Rle> <IRanges> <Rle> |
>>> # chrV:15822727 chrV [15822727, 15822727] + |
>>> # chrV:15822727 chrV [15822727, 15822727] + |
>>> # REF ALT varAllele
>>> # <DNAStringSet> <DNAStringSetList> <DNAStringSet>
>>> # chrV:15822727 G A A
>>> # chrV:15822727 G A A
>>> # REFCODON VARCODON
>>> # <DNAStringSet> <DNAStringSet>
>>> # chrV:15822727 GAT TAT
>>> # chrV:15822727 GAT AAT
>>> # ---
>>> # seqlengths:
>>> # chrI chrV
>>> # NA NA
>>>
>>>
>>> # interestingly, when altering the annotation or the set of variants,
>>> # this contradictions disapears (GAT -> AAT):
>>> vcf2 <- vcf[-1] # remove the first SNV
>>> aa2 <- predictCoding(query=vcf2, subject=txdb, seqSource=Celegans)
>>> length(aa2)
>>> #[1] 5
>>> aa2[names(aa2)=="chrV:15822727",c("REF","ALT","varAllele","REFCODON","VARCODON")]
>>>
>>>
>>> #GRanges with 2 ranges and 5 metadata columns:
>>> # seqnames ranges strand |
>>> # <Rle> <IRanges> <Rle> |
>>> # chrV:15822727 chrV [15822727, 15822727] + |
>>> # chrV:15822727 chrV [15822727, 15822727] + |
>>> # REF ALT varAllele
>>> # <DNAStringSet> <DNAStringSetList> <DNAStringSet>
>>> # chrV:15822727 G A A
>>> # chrV:15822727 G A A
>>> # REFCODON VARCODON
>>> # <DNAStringSet> <DNAStringSet>
>>> # chrV:15822727 GAT AAT
>>> # chrV:15822727 GAT AAT
>>> # ---
>>> # seqlengths:
>>> # chrI chrV
>>> # NA NA
>>>
>>> Do you know why this could be the case?
>>> Regards,
>>> Michael
>>>
>>>
>>> Here is my envirnoment:
>>> sessionInfo()
>>> R Under development (unstable) (2013-02-25 r62061)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] C
>>>
>>> attached base packages:
>>> [1] parallel stats graphics grDevices utils datasets
>>> [7] methods base
>>>
>>> other attached packages:
>>> [1] BSgenome.Celegans.UCSC.ce6_1.3.19
>>> [2] BSgenome_1.27.1
>>> [3] GenomicFeatures_1.11.14
>>> [4] AnnotationDbi_1.21.12
>>> [5] Biobase_2.19.3
>>> [6] VariantAnnotation_1.5.42
>>> [7] Rsamtools_1.11.21
>>> [8] Biostrings_2.27.11
>>> [9] GenomicRanges_1.11.35
>>> [10] IRanges_1.17.35
>>> [11] BiocGenerics_0.5.6
>>> [12] RColorBrewer_1.0-5
>>>
>>> loaded via a namespace (and not attached):
>>> [1] DBI_0.2-5 RCurl_1.95-4.1 RSQLite_0.11.2
>>> [4] XML_3.95-0.1 biomaRt_2.15.0 bitops_1.0-5
>>> [7] rtracklayer_1.19.9 stats4_3.0.0 tools_3.0.0
>>> [10] zlibbioc_1.5.0
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
More information about the Bioc-devel
mailing list