[Bioc-devel] subtle error with VariantAnnotation::locateVariants()

Valerie Obenchain vobencha at fredhutch.org
Tue Feb 3 18:13:55 CET 2015


Hi Robert,

Thanks for reporting this. Now fixed in 1.13.27.

I introduced this bug when switching the coordinate mapping over from 
mapCoords() to the new mapToTranscripts() family. The output from the 
new mapper is cleaner and allowed me to tidy some old code. In the 
process I realized there could be cases with >1 CDSID per row and 
changed the data type from 'integer' to 'IntegerList'. I thought I 
changed the default in all places but obviously missed a couple. The 
specific bug you hit was complaining that GRanges with different types 
for CDSID couldn't not be combined.

Valerie

On 02/03/2015 05:57 AM, Robert Castelo wrote:
> hi,
>
> VariantAnnotation::locateVariants() is breaking in a very specific way
> in its current devel version (1.13.26). Since I'm using it while working
> on VariantFiltering, I have reverted my copy of VariantAnnotation to
> version 1.13.24 to be able to continue working.
>
> so at the moment this is not much of a problem for me, but I thought you
> might be interested in knowing about it, just in case you were not aware
> of it. This is the minimal example I've come up reproducing it:
>
> library(VariantAnnotation)
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>
> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
>
> gr <- GRanges(rep("chr20", 1), IRanges(start=44501458, width=1))
>
> loc <- locateVariants(gr, txdb,
>                        AllVariants(intergenic=IntergenicVariants(0, 0)))
> Error in .Primitive("c")(<S4 object of class "CompressedIntegerList">,  :
>    all arguments in '...' must be CompressedList objects
> traceback()
> traceback()
> 23: stop("all arguments in '...' must be CompressedList objects")
> 22: .Primitive("c")(<S4 object of class "CompressedIntegerList">,
>          NA_integer_)
> 21: .Primitive("c")(<S4 object of class "CompressedIntegerList">,
>          NA_integer_)
> 20: do.call(c, unname(cols))
> 19: do.call(c, unname(cols))
> 18: FUN(c("LOCATION", "LOCSTART", "LOCEND", "QUERYID", "TXID", "CDSID",
>      "GENEID", "PRECEDEID", "FOLLOWID")[[6L]], ...)
> 17: lapply(colnames(df), function(cn) {
>          cols <- lapply(args, `[[`, cn)
>          isRle <- vapply(cols, is, logical(1L), "Rle")
>          if (any(isRle) && !all(isRle)) {
>              cols[isRle] <- lapply(cols[isRle], S4Vectors:::decodeRle)
>          }
>          isFactor <- vapply(cols, is.factor, logical(1L))
>          if (any(isFactor)) {
>              cols <- lapply(cols, as.factor)
>              levs <- unique(unlist(lapply(cols, levels), use.names =
> FALSE))
>              cols <- lapply(cols, factor, levs)
>          }
>          rectangular <- length(dim(cols[[1]])) == 2L
>          if (rectangular) {
>              combined <- do.call(rbind, unname(cols))
>          }
>          else {
>              combined <- do.call(c, unname(cols))
>          }
>          if (any(isFactor))
>              combined <- structure(combined, class = "factor", levels =
> levs)
>          combined
>      })
> 16: lapply(colnames(df), function(cn) {
>          cols <- lapply(args, `[[`, cn)
>          isRle <- vapply(cols, is, logical(1L), "Rle")
>          if (any(isRle) && !all(isRle)) {
>              cols[isRle] <- lapply(cols[isRle], S4Vectors:::decodeRle)
>          }
>          isFactor <- vapply(cols, is.factor, logical(1L))
>          if (any(isFactor)) {
>              cols <- lapply(cols, as.factor)
>              levs <- unique(unlist(lapply(cols, levels), use.names =
> FALSE))
>              cols <- lapply(cols, factor, levs)
>          }
>          rectangular <- length(dim(cols[[1]])) == 2L
>          if (rectangular) {
>              combined <- do.call(rbind, unname(cols))
>          }
>          else {
>              combined <- do.call(c, unname(cols))
>          }
>          if (any(isFactor))
>              combined <- structure(combined, class = "factor", levels =
> levs)
>          combined
>      })
> 15: .Method(..., deparse.level = deparse.level)
> 14: eval(expr, envir, enclos)
> 13: eval(.dotsCall, env)
> 12: eval(.dotsCall, env)
> 11: standardGeneric("rbind")
> 10: (function (..., deparse.level = 1)
>      standardGeneric("rbind"))(<S4 object of class "DataFrame">, <S4
> object of class "DataFrame">,
>          <S4 object of class "DataFrame">, <S4 object of class
> "DataFrame">,
>          <S4 object of class "DataFrame">, <S4 object of class
> "DataFrame">,
>          <S4 object of class "DataFrame">)
> 9: do.call(rbind, lapply(x, mcols, FALSE))
> 8: do.call(rbind, lapply(x, mcols, FALSE))
> 7: .unlist_list_of_GenomicRanges(args, ignore.mcols = ignore.mcols)
> 6: .local(x, ..., recursive = recursive)
> 5: c(coding, intron, fiveUTR, threeUTR, splice, promoter, intergenic)
> 4: c(coding, intron, fiveUTR, threeUTR, splice, promoter, intergenic)
> 3: .local(query, subject, region, ...)
> 2: locateVariants(gr, txdb, AllVariants(intergenic = IntergenicVariants(0,
>         0)))
> 1: locateVariants(gr, txdb, AllVariants(intergenic = IntergenicVariants(0,
>         0)))
>
> interestingly, if i replace the chromosome name in the GRanges object of
> 'chr20' by 'chr2', then it works:
>
> gr <- GRanges(rep("chr2", 1), IRanges(start=44501458, width=1))
> loc <- locateVariants(gr, txdb,
>                        AllVariants(intergenic=IntergenicVariants(0, 0)))
>
> or if i replace the start of the ranges of 44501458 by 1, then it also
> works:
>
> gr <- GRanges(rep("chr20", 1), IRanges(start=1, width=1))
> loc <- locateVariants(gr, txdb,
>                        AllVariants(intergenic=IntergenicVariants(0, 0)))
>
> here is the sessionInfo():
>
> R Under development (unstable) (2014-10-14 r66765)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF8       LC_NUMERIC=C LC_TIME=en_US.UTF8
> LC_COLLATE=en_US.UTF8
>   [5] LC_MONETARY=en_US.UTF8    LC_MESSAGES=en_US.UTF8
> LC_PAPER=en_US.UTF8       LC_NAME=C
>   [9] LC_ADDRESS=C              LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF8
> LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats4    parallel  stats     graphics  grDevices utils     datasets
>   methods   base
>
> other attached packages:
>   [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0 GenomicFeatures_1.19.17
>                AnnotationDbi_1.29.17
>   [4] Biobase_2.27.1                          VariantAnnotation_1.13.26
>                Rsamtools_1.19.26
>   [7] Biostrings_2.35.7                       XVector_0.7.3
>                GenomicRanges_1.19.35
> [10] GenomeInfoDb_1.3.12                     IRanges_2.1.36
> S4Vectors_0.5.17
> [13] BiocGenerics_0.13.4                     vimcom_1.0-0
> setwidth_1.0-3
> [16] colorout_1.0-3
>
> loaded via a namespace (and not attached):
>   [1] base64enc_0.1-2          BatchJobs_1.5            BBmisc_1.8
>           BiocParallel_1.1.13      biomaRt_2.23.5
>   [6] bitops_1.0-6             brew_1.0-6               BSgenome_1.35.16
>          checkmate_1.5.1          codetools_0.2-10
> [11] DBI_0.3.1                digest_0.6.8             fail_1.2
> foreach_1.4.2            GenomicAlignments_1.3.27
> [16] iterators_1.0.7          RCurl_1.95-4.5           RSQLite_1.0.0
>          rtracklayer_1.27.7       sendmailR_1.2-1
> [21] stringr_0.6.2            tools_3.2.0              XML_3.98-1.1
>          zlibbioc_1.13.0
>
>
> this problem dissapear if i revert to the older version (1.13.24),
>
> svn merge -r 98874:98522 .
>
> build and install it. in a new fresh R-devel session the same code
> smoothly with some warnings, i believe unrelated to this problem:
>
> [...]
> loc <- locateVariants(gr, txdb,
> +                       AllVariants(intergenic=IntergenicVariants(0, 0)))
> Warning messages:
> 1: '.local' is deprecated.
> Use 'mapToTranscripts' instead.
> See help("Deprecated")
> 2: '.local' is deprecated.
> Use 'mapToTranscripts' instead.
> See help("Deprecated")
> 3: '.local' is deprecated.
> Use 'mapToTranscripts' instead.
> See help("Deprecated")
> 4: '.local' is deprecated.
> Use 'mapToTranscripts' instead.
> See help("Deprecated")
>
> this is the sessionInfo() of this run:
>
> R Under development (unstable) (2014-10-14 r66765)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF8       LC_NUMERIC=C LC_TIME=en_US.UTF8
> LC_COLLATE=en_US.UTF8
>   [5] LC_MONETARY=en_US.UTF8    LC_MESSAGES=en_US.UTF8
> LC_PAPER=en_US.UTF8       LC_NAME=C
>   [9] LC_ADDRESS=C              LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF8
> LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats4    parallel  stats     graphics  grDevices utils     datasets
>   methods   base
>
> other attached packages:
>   [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0 GenomicFeatures_1.19.17
>                AnnotationDbi_1.29.17
>   [4] Biobase_2.27.1                          VariantAnnotation_1.13.24
>                Rsamtools_1.19.26
>   [7] Biostrings_2.35.7                       XVector_0.7.3
>                GenomicRanges_1.19.35
> [10] GenomeInfoDb_1.3.12                     IRanges_2.1.36
> S4Vectors_0.5.17
> [13] BiocGenerics_0.13.4                     vimcom_1.0-0
> setwidth_1.0-3
> [16] colorout_1.0-3
>
> loaded via a namespace (and not attached):
>   [1] base64enc_0.1-2          BatchJobs_1.5            BBmisc_1.8
>           BiocParallel_1.1.13      biomaRt_2.23.5
>   [6] bitops_1.0-6             brew_1.0-6               BSgenome_1.35.16
>          checkmate_1.5.1          codetools_0.2-10
> [11] DBI_0.3.1                digest_0.6.8             fail_1.2
> foreach_1.4.2            GenomicAlignments_1.3.27
> [16] iterators_1.0.7          RCurl_1.95-4.5           RSQLite_1.0.0
>          rtracklayer_1.27.7       sendmailR_1.2-1
> [21] stringr_0.6.2            tools_3.2.0              XML_3.98-1.1
>          zlibbioc_1.13.0
>
> cheers,
>
> robert.
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list