[Bioc-devel] subtle error with VariantAnnotation::locateVariants()
Valerie Obenchain
vobencha at fredhutch.org
Tue Feb 3 18:13:55 CET 2015
Hi Robert,
Thanks for reporting this. Now fixed in 1.13.27.
I introduced this bug when switching the coordinate mapping over from
mapCoords() to the new mapToTranscripts() family. The output from the
new mapper is cleaner and allowed me to tidy some old code. In the
process I realized there could be cases with >1 CDSID per row and
changed the data type from 'integer' to 'IntegerList'. I thought I
changed the default in all places but obviously missed a couple. The
specific bug you hit was complaining that GRanges with different types
for CDSID couldn't not be combined.
Valerie
On 02/03/2015 05:57 AM, Robert Castelo wrote:
> hi,
>
> VariantAnnotation::locateVariants() is breaking in a very specific way
> in its current devel version (1.13.26). Since I'm using it while working
> on VariantFiltering, I have reverted my copy of VariantAnnotation to
> version 1.13.24 to be able to continue working.
>
> so at the moment this is not much of a problem for me, but I thought you
> might be interested in knowing about it, just in case you were not aware
> of it. This is the minimal example I've come up reproducing it:
>
> library(VariantAnnotation)
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>
> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
>
> gr <- GRanges(rep("chr20", 1), IRanges(start=44501458, width=1))
>
> loc <- locateVariants(gr, txdb,
> AllVariants(intergenic=IntergenicVariants(0, 0)))
> Error in .Primitive("c")(<S4 object of class "CompressedIntegerList">, :
> all arguments in '...' must be CompressedList objects
> traceback()
> traceback()
> 23: stop("all arguments in '...' must be CompressedList objects")
> 22: .Primitive("c")(<S4 object of class "CompressedIntegerList">,
> NA_integer_)
> 21: .Primitive("c")(<S4 object of class "CompressedIntegerList">,
> NA_integer_)
> 20: do.call(c, unname(cols))
> 19: do.call(c, unname(cols))
> 18: FUN(c("LOCATION", "LOCSTART", "LOCEND", "QUERYID", "TXID", "CDSID",
> "GENEID", "PRECEDEID", "FOLLOWID")[[6L]], ...)
> 17: lapply(colnames(df), function(cn) {
> cols <- lapply(args, `[[`, cn)
> isRle <- vapply(cols, is, logical(1L), "Rle")
> if (any(isRle) && !all(isRle)) {
> cols[isRle] <- lapply(cols[isRle], S4Vectors:::decodeRle)
> }
> isFactor <- vapply(cols, is.factor, logical(1L))
> if (any(isFactor)) {
> cols <- lapply(cols, as.factor)
> levs <- unique(unlist(lapply(cols, levels), use.names =
> FALSE))
> cols <- lapply(cols, factor, levs)
> }
> rectangular <- length(dim(cols[[1]])) == 2L
> if (rectangular) {
> combined <- do.call(rbind, unname(cols))
> }
> else {
> combined <- do.call(c, unname(cols))
> }
> if (any(isFactor))
> combined <- structure(combined, class = "factor", levels =
> levs)
> combined
> })
> 16: lapply(colnames(df), function(cn) {
> cols <- lapply(args, `[[`, cn)
> isRle <- vapply(cols, is, logical(1L), "Rle")
> if (any(isRle) && !all(isRle)) {
> cols[isRle] <- lapply(cols[isRle], S4Vectors:::decodeRle)
> }
> isFactor <- vapply(cols, is.factor, logical(1L))
> if (any(isFactor)) {
> cols <- lapply(cols, as.factor)
> levs <- unique(unlist(lapply(cols, levels), use.names =
> FALSE))
> cols <- lapply(cols, factor, levs)
> }
> rectangular <- length(dim(cols[[1]])) == 2L
> if (rectangular) {
> combined <- do.call(rbind, unname(cols))
> }
> else {
> combined <- do.call(c, unname(cols))
> }
> if (any(isFactor))
> combined <- structure(combined, class = "factor", levels =
> levs)
> combined
> })
> 15: .Method(..., deparse.level = deparse.level)
> 14: eval(expr, envir, enclos)
> 13: eval(.dotsCall, env)
> 12: eval(.dotsCall, env)
> 11: standardGeneric("rbind")
> 10: (function (..., deparse.level = 1)
> standardGeneric("rbind"))(<S4 object of class "DataFrame">, <S4
> object of class "DataFrame">,
> <S4 object of class "DataFrame">, <S4 object of class
> "DataFrame">,
> <S4 object of class "DataFrame">, <S4 object of class
> "DataFrame">,
> <S4 object of class "DataFrame">)
> 9: do.call(rbind, lapply(x, mcols, FALSE))
> 8: do.call(rbind, lapply(x, mcols, FALSE))
> 7: .unlist_list_of_GenomicRanges(args, ignore.mcols = ignore.mcols)
> 6: .local(x, ..., recursive = recursive)
> 5: c(coding, intron, fiveUTR, threeUTR, splice, promoter, intergenic)
> 4: c(coding, intron, fiveUTR, threeUTR, splice, promoter, intergenic)
> 3: .local(query, subject, region, ...)
> 2: locateVariants(gr, txdb, AllVariants(intergenic = IntergenicVariants(0,
> 0)))
> 1: locateVariants(gr, txdb, AllVariants(intergenic = IntergenicVariants(0,
> 0)))
>
> interestingly, if i replace the chromosome name in the GRanges object of
> 'chr20' by 'chr2', then it works:
>
> gr <- GRanges(rep("chr2", 1), IRanges(start=44501458, width=1))
> loc <- locateVariants(gr, txdb,
> AllVariants(intergenic=IntergenicVariants(0, 0)))
>
> or if i replace the start of the ranges of 44501458 by 1, then it also
> works:
>
> gr <- GRanges(rep("chr20", 1), IRanges(start=1, width=1))
> loc <- locateVariants(gr, txdb,
> AllVariants(intergenic=IntergenicVariants(0, 0)))
>
> here is the sessionInfo():
>
> R Under development (unstable) (2014-10-14 r66765)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C LC_TIME=en_US.UTF8
> LC_COLLATE=en_US.UTF8
> [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
> LC_PAPER=en_US.UTF8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF8
> LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats4 parallel stats graphics grDevices utils datasets
> methods base
>
> other attached packages:
> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0 GenomicFeatures_1.19.17
> AnnotationDbi_1.29.17
> [4] Biobase_2.27.1 VariantAnnotation_1.13.26
> Rsamtools_1.19.26
> [7] Biostrings_2.35.7 XVector_0.7.3
> GenomicRanges_1.19.35
> [10] GenomeInfoDb_1.3.12 IRanges_2.1.36
> S4Vectors_0.5.17
> [13] BiocGenerics_0.13.4 vimcom_1.0-0
> setwidth_1.0-3
> [16] colorout_1.0-3
>
> loaded via a namespace (and not attached):
> [1] base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.8
> BiocParallel_1.1.13 biomaRt_2.23.5
> [6] bitops_1.0-6 brew_1.0-6 BSgenome_1.35.16
> checkmate_1.5.1 codetools_0.2-10
> [11] DBI_0.3.1 digest_0.6.8 fail_1.2
> foreach_1.4.2 GenomicAlignments_1.3.27
> [16] iterators_1.0.7 RCurl_1.95-4.5 RSQLite_1.0.0
> rtracklayer_1.27.7 sendmailR_1.2-1
> [21] stringr_0.6.2 tools_3.2.0 XML_3.98-1.1
> zlibbioc_1.13.0
>
>
> this problem dissapear if i revert to the older version (1.13.24),
>
> svn merge -r 98874:98522 .
>
> build and install it. in a new fresh R-devel session the same code
> smoothly with some warnings, i believe unrelated to this problem:
>
> [...]
> loc <- locateVariants(gr, txdb,
> + AllVariants(intergenic=IntergenicVariants(0, 0)))
> Warning messages:
> 1: '.local' is deprecated.
> Use 'mapToTranscripts' instead.
> See help("Deprecated")
> 2: '.local' is deprecated.
> Use 'mapToTranscripts' instead.
> See help("Deprecated")
> 3: '.local' is deprecated.
> Use 'mapToTranscripts' instead.
> See help("Deprecated")
> 4: '.local' is deprecated.
> Use 'mapToTranscripts' instead.
> See help("Deprecated")
>
> this is the sessionInfo() of this run:
>
> R Under development (unstable) (2014-10-14 r66765)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C LC_TIME=en_US.UTF8
> LC_COLLATE=en_US.UTF8
> [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
> LC_PAPER=en_US.UTF8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF8
> LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats4 parallel stats graphics grDevices utils datasets
> methods base
>
> other attached packages:
> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0 GenomicFeatures_1.19.17
> AnnotationDbi_1.29.17
> [4] Biobase_2.27.1 VariantAnnotation_1.13.24
> Rsamtools_1.19.26
> [7] Biostrings_2.35.7 XVector_0.7.3
> GenomicRanges_1.19.35
> [10] GenomeInfoDb_1.3.12 IRanges_2.1.36
> S4Vectors_0.5.17
> [13] BiocGenerics_0.13.4 vimcom_1.0-0
> setwidth_1.0-3
> [16] colorout_1.0-3
>
> loaded via a namespace (and not attached):
> [1] base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.8
> BiocParallel_1.1.13 biomaRt_2.23.5
> [6] bitops_1.0-6 brew_1.0-6 BSgenome_1.35.16
> checkmate_1.5.1 codetools_0.2-10
> [11] DBI_0.3.1 digest_0.6.8 fail_1.2
> foreach_1.4.2 GenomicAlignments_1.3.27
> [16] iterators_1.0.7 RCurl_1.95-4.5 RSQLite_1.0.0
> rtracklayer_1.27.7 sendmailR_1.2-1
> [21] stringr_0.6.2 tools_3.2.0 XML_3.98-1.1
> zlibbioc_1.13.0
>
> cheers,
>
> robert.
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list