[Bioc-devel] little tiny bug in CDSID annotations from predictCoding()

Robert Castelo robert.castelo at upf.edu
Tue Mar 17 19:39:44 CET 2015


Thanks! Do you have an explanation for the apparent disagreement in CDSID annotations that i described below the bug, between predictCoding() and locateVariants()?

robert.

-------- Original message --------
From: Valerie Obenchain <vobencha at fredhutch.org> 
Date:17/03/2015  19:18  (GMT+01:00) 
To: bioc-devel at r-project.org 
Subject: Re: [Bioc-devel] little tiny bug in CDSID annotations from
 	predictCoding() 

Hi Robert,

Thanks for reporting the typo and bug. Now fixed in 1.13.41.

Valerie

On 03/17/2015 10:58 AM, Robert Castelo wrote:
> in my message below, the line that it says:
>
> head(loc_all$CDSID)
>
> it should say
>
> head(coding2$CDSID)
>
> cheers,
>
> robert.
>
> ==========================================
> hi,
>
> there is a little tiny bug in the current devel version of
> VariantAnnotation::predictCoding(), and more concretely within
> VariantAnnotation:::.localCoordinates(), that precludes the correct
> annotation of the CDSID column:
>
> library(VariantAnnotation)
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> library(BSgenome.Hsapiens.UCSC.hg19)
>
> vcf <- readVcf(system.file("extdata", "CEUtrio.vcf.bgz",
> package="VariantFiltering"), genome="hg19")
> seqlevelsStyle(vcf) <- seqlevelsStyle(txdb)
> vcf <- dropSeqlevels(vcf, "chrM")
> coding1 <- predictCoding(vcf, txdb, Hsapiens)
> head(coding1$CDSID)
> IntegerList of length 6
> [[1]] integer(0)
> [[2]] integer(0)
> [[3]] integer(0)
> [[4]] integer(0)
> [[5]] integer(0)
> [[6]] integer(0)
> table(elementLengths(coding1$CDSID))
>
>     0
> 6038
>
> my sessionInfo() is at the end of the message.
>
> here is the patch, just replacing 'map2$trancriptHits' by
> 'map2$transcriptsHits':
>
> --- R/AllUtilities.R    (revision 100756)
> +++ R/AllUtilities.R    (working copy)
> @@ -284,7 +284,7 @@
>       cdsid <- IntegerList(integer(0))
>       map2 <- mapToTranscripts(unname(from)[xHits], to,
>                                ignore.strand=ignore.strand)
> -    cds <- mcols(unlist(to, use.names=FALSE))$cds_id[map2$trancriptsHits]
> +    cds <- mcols(unlist(to, use.names=FALSE))$cds_id[map2$transcriptsHits]
>       if (length(cds)) {
>           cdslst <- unique(splitAsList(cds, map2$xHits))
>           cdsid <- cdslst
>
> with this fix then things seem to work again:
>
> coding1 <- predictCoding(vcf, txdb, Hsapiens)
>> head(coding1$CDSID)
> IntegerList of length 6
> [["1"]] 21771
> [["2"]] 21771
> [["3"]] 21771
> [["4"]] 21771
> [["5"]] 21428
> [["6"]] 21428
> table(elementLengths(coding1$CDSID))
>
>     1    2    3    4    5    6    7    8    9   10   12   13   14   16   19
>   873 1229 1024  993  615  524  324  168   82   21   12   15   42   76   40
>
> while investigating this bug i used VariantAnnotation::locateVariants()
> which also annotates the CDSID column, and it seemed to be working.
> however, i noticed that both, predictCoding() and locateVariants(), do
> not give an identical annotation for the CDSID column in coding variants:
>
> coding2 <- locateVariants(vcf, txdb, CodingVariants())
> head(loc_all$CDSID)
> IntegerList of length 6
> [["1"]] 210777
> [["2"]] 210777
> [["3"]] 210777
> [["4"]] 210778
> [["5"]] 208140
> [["6"]] 208141
> table(elementLengths(coding2$CDSID))
>
>     1    2    3    4
> 4987  901  138   12
>
> in principle, it seems that both are annotating valid CDSID keys:
>
> allcdsinfo <- select(txdb, keys=keys(txdb, keytype="CDSID"),
> columns="CDSID", keytype="CDSID")
> sum(!as.character(unlist(coding1$CDSID, use.names=FALSE)) %in%
> allcdsinfo$CDSID)
> [1] 0
> sum(!as.character(unlist(coding2$CDSID, use.names=FALSE)) %in%
> allcdsinfo$CDSID)
> [1] 0
>
> but predictCoding() annotates CDSID values that are not present in
> locateVariants() annotations and viceversa:
>
> sum(!as.character(unlist(coding1$CDSID, use.names=FALSE)) %in%
> as.character(unlist(coding2$CDSID, use.names=FALSE)))
> [1] 24057
> sum(!as.character(unlist(coding2$CDSID, use.names=FALSE)) %in%
> as.character(unlist(coding1$CDSID, use.names=FALSE)))
> [1] 7251
>
> length(unique(intersect(as.character(unlist(coding2$CDSID,
> use.names=FALSE)), as.character(unlist(coding1$CDSID, use.names=FALSE)))))
> [1] 0
>
> should not both annotate the same CDSID values on coding variants?
>
>
> thanks!
> robert.
> ps: sessionInfo()
> R Under development (unstable) (2014-10-14 r66765)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF8        LC_COLLATE=en_US.UTF8
>   [5] LC_MONETARY=en_US.UTF8    LC_MESSAGES=en_US.UTF8
>   [7] LC_PAPER=en_US.UTF8       LC_NAME=C
>   [9] LC_ADDRESS=C              LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats4    parallel  stats     graphics  grDevices
> [6] utils     datasets  methods   base
>
> other attached packages:
>   [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.1.1
>   [2] GenomicFeatures_1.19.31
>   [3] AnnotationDbi_1.29.17
>   [4] Biobase_2.27.2
>   [5] BSgenome.Hsapiens.UCSC.hg19_1.4.0
>   [6] BSgenome_1.35.17
>   [7] rtracklayer_1.27.8
>   [8] VariantAnnotation_1.13.40
>   [9] Rsamtools_1.19.44
> [10] Biostrings_2.35.11
> [11] XVector_0.7.4
> [12] GenomicRanges_1.19.46
> [13] GenomeInfoDb_1.3.13
> [14] IRanges_2.1.43
> [15] S4Vectors_0.5.22
> [16] BiocGenerics_0.13.6
> [17] vimcom_1.0-0
> [18] setwidth_1.0-3
> [19] colorout_1.0-3
>
> loaded via a namespace (and not attached):
>   [1] BiocParallel_1.1.15      biomaRt_2.23.5
>   [3] bitops_1.0-6             DBI_0.3.1
>   [5] GenomicAlignments_1.3.31 RCurl_1.95-4.5
>   [7] RSQLite_1.0.0            tools_3.2.0
>   [9] XML_3.98-1.1             zlibbioc_1.13.2
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list