[Bioc-devel] subtle error with VariantAnnotation::locateVariants()

Robert Castelo robert.castelo at upf.edu
Tue Feb 3 19:03:10 CET 2015


hi, great! thanks for the quick fix.

robert.

On 02/03/2015 06:13 PM, Valerie Obenchain wrote:
> Hi Robert,
>
> Thanks for reporting this. Now fixed in 1.13.27.
>
> I introduced this bug when switching the coordinate mapping over from
> mapCoords() to the new mapToTranscripts() family. The output from the
> new mapper is cleaner and allowed me to tidy some old code. In the
> process I realized there could be cases with >1 CDSID per row and
> changed the data type from 'integer' to 'IntegerList'. I thought I
> changed the default in all places but obviously missed a couple. The
> specific bug you hit was complaining that GRanges with different types
> for CDSID couldn't not be combined.
>
> Valerie
>
> On 02/03/2015 05:57 AM, Robert Castelo wrote:
>> hi,
>>
>> VariantAnnotation::locateVariants() is breaking in a very specific way
>> in its current devel version (1.13.26). Since I'm using it while working
>> on VariantFiltering, I have reverted my copy of VariantAnnotation to
>> version 1.13.24 to be able to continue working.
>>
>> so at the moment this is not much of a problem for me, but I thought you
>> might be interested in knowing about it, just in case you were not aware
>> of it. This is the minimal example I've come up reproducing it:
>>
>> library(VariantAnnotation)
>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>
>> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
>>
>> gr <- GRanges(rep("chr20", 1), IRanges(start=44501458, width=1))
>>
>> loc <- locateVariants(gr, txdb,
>> AllVariants(intergenic=IntergenicVariants(0, 0)))
>> Error in .Primitive("c")(<S4 object of class "CompressedIntegerList">, :
>> all arguments in '...' must be CompressedList objects
>> traceback()
>> traceback()
>> 23: stop("all arguments in '...' must be CompressedList objects")
>> 22: .Primitive("c")(<S4 object of class "CompressedIntegerList">,
>> NA_integer_)
>> 21: .Primitive("c")(<S4 object of class "CompressedIntegerList">,
>> NA_integer_)
>> 20: do.call(c, unname(cols))
>> 19: do.call(c, unname(cols))
>> 18: FUN(c("LOCATION", "LOCSTART", "LOCEND", "QUERYID", "TXID", "CDSID",
>> "GENEID", "PRECEDEID", "FOLLOWID")[[6L]], ...)
>> 17: lapply(colnames(df), function(cn) {
>> cols <- lapply(args, `[[`, cn)
>> isRle <- vapply(cols, is, logical(1L), "Rle")
>> if (any(isRle) && !all(isRle)) {
>> cols[isRle] <- lapply(cols[isRle], S4Vectors:::decodeRle)
>> }
>> isFactor <- vapply(cols, is.factor, logical(1L))
>> if (any(isFactor)) {
>> cols <- lapply(cols, as.factor)
>> levs <- unique(unlist(lapply(cols, levels), use.names =
>> FALSE))
>> cols <- lapply(cols, factor, levs)
>> }
>> rectangular <- length(dim(cols[[1]])) == 2L
>> if (rectangular) {
>> combined <- do.call(rbind, unname(cols))
>> }
>> else {
>> combined <- do.call(c, unname(cols))
>> }
>> if (any(isFactor))
>> combined <- structure(combined, class = "factor", levels =
>> levs)
>> combined
>> })
>> 16: lapply(colnames(df), function(cn) {
>> cols <- lapply(args, `[[`, cn)
>> isRle <- vapply(cols, is, logical(1L), "Rle")
>> if (any(isRle) && !all(isRle)) {
>> cols[isRle] <- lapply(cols[isRle], S4Vectors:::decodeRle)
>> }
>> isFactor <- vapply(cols, is.factor, logical(1L))
>> if (any(isFactor)) {
>> cols <- lapply(cols, as.factor)
>> levs <- unique(unlist(lapply(cols, levels), use.names =
>> FALSE))
>> cols <- lapply(cols, factor, levs)
>> }
>> rectangular <- length(dim(cols[[1]])) == 2L
>> if (rectangular) {
>> combined <- do.call(rbind, unname(cols))
>> }
>> else {
>> combined <- do.call(c, unname(cols))
>> }
>> if (any(isFactor))
>> combined <- structure(combined, class = "factor", levels =
>> levs)
>> combined
>> })
>> 15: .Method(..., deparse.level = deparse.level)
>> 14: eval(expr, envir, enclos)
>> 13: eval(.dotsCall, env)
>> 12: eval(.dotsCall, env)
>> 11: standardGeneric("rbind")
>> 10: (function (..., deparse.level = 1)
>> standardGeneric("rbind"))(<S4 object of class "DataFrame">, <S4
>> object of class "DataFrame">,
>> <S4 object of class "DataFrame">, <S4 object of class
>> "DataFrame">,
>> <S4 object of class "DataFrame">, <S4 object of class
>> "DataFrame">,
>> <S4 object of class "DataFrame">)
>> 9: do.call(rbind, lapply(x, mcols, FALSE))
>> 8: do.call(rbind, lapply(x, mcols, FALSE))
>> 7: .unlist_list_of_GenomicRanges(args, ignore.mcols = ignore.mcols)
>> 6: .local(x, ..., recursive = recursive)
>> 5: c(coding, intron, fiveUTR, threeUTR, splice, promoter, intergenic)
>> 4: c(coding, intron, fiveUTR, threeUTR, splice, promoter, intergenic)
>> 3: .local(query, subject, region, ...)
>> 2: locateVariants(gr, txdb, AllVariants(intergenic =
>> IntergenicVariants(0,
>> 0)))
>> 1: locateVariants(gr, txdb, AllVariants(intergenic =
>> IntergenicVariants(0,
>> 0)))
>>
>> interestingly, if i replace the chromosome name in the GRanges object of
>> 'chr20' by 'chr2', then it works:
>>
>> gr <- GRanges(rep("chr2", 1), IRanges(start=44501458, width=1))
>> loc <- locateVariants(gr, txdb,
>> AllVariants(intergenic=IntergenicVariants(0, 0)))
>>
>> or if i replace the start of the ranges of 44501458 by 1, then it also
>> works:
>>
>> gr <- GRanges(rep("chr20", 1), IRanges(start=1, width=1))
>> loc <- locateVariants(gr, txdb,
>> AllVariants(intergenic=IntergenicVariants(0, 0)))
>>
>> here is the sessionInfo():
>>
>> R Under development (unstable) (2014-10-14 r66765)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C LC_TIME=en_US.UTF8
>> LC_COLLATE=en_US.UTF8
>> [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
>> LC_PAPER=en_US.UTF8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF8
>> LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats4 parallel stats graphics grDevices utils datasets
>> methods base
>>
>> other attached packages:
>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0 GenomicFeatures_1.19.17
>> AnnotationDbi_1.29.17
>> [4] Biobase_2.27.1 VariantAnnotation_1.13.26
>> Rsamtools_1.19.26
>> [7] Biostrings_2.35.7 XVector_0.7.3
>> GenomicRanges_1.19.35
>> [10] GenomeInfoDb_1.3.12 IRanges_2.1.36
>> S4Vectors_0.5.17
>> [13] BiocGenerics_0.13.4 vimcom_1.0-0
>> setwidth_1.0-3
>> [16] colorout_1.0-3
>>
>> loaded via a namespace (and not attached):
>> [1] base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.8
>> BiocParallel_1.1.13 biomaRt_2.23.5
>> [6] bitops_1.0-6 brew_1.0-6 BSgenome_1.35.16
>> checkmate_1.5.1 codetools_0.2-10
>> [11] DBI_0.3.1 digest_0.6.8 fail_1.2
>> foreach_1.4.2 GenomicAlignments_1.3.27
>> [16] iterators_1.0.7 RCurl_1.95-4.5 RSQLite_1.0.0
>> rtracklayer_1.27.7 sendmailR_1.2-1
>> [21] stringr_0.6.2 tools_3.2.0 XML_3.98-1.1
>> zlibbioc_1.13.0
>>
>>
>> this problem dissapear if i revert to the older version (1.13.24),
>>
>> svn merge -r 98874:98522 .
>>
>> build and install it. in a new fresh R-devel session the same code
>> smoothly with some warnings, i believe unrelated to this problem:
>>
>> [...]
>> loc <- locateVariants(gr, txdb,
>> + AllVariants(intergenic=IntergenicVariants(0, 0)))
>> Warning messages:
>> 1: '.local' is deprecated.
>> Use 'mapToTranscripts' instead.
>> See help("Deprecated")
>> 2: '.local' is deprecated.
>> Use 'mapToTranscripts' instead.
>> See help("Deprecated")
>> 3: '.local' is deprecated.
>> Use 'mapToTranscripts' instead.
>> See help("Deprecated")
>> 4: '.local' is deprecated.
>> Use 'mapToTranscripts' instead.
>> See help("Deprecated")
>>
>> this is the sessionInfo() of this run:
>>
>> R Under development (unstable) (2014-10-14 r66765)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C LC_TIME=en_US.UTF8
>> LC_COLLATE=en_US.UTF8
>> [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
>> LC_PAPER=en_US.UTF8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF8
>> LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats4 parallel stats graphics grDevices utils datasets
>> methods base
>>
>> other attached packages:
>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0 GenomicFeatures_1.19.17
>> AnnotationDbi_1.29.17
>> [4] Biobase_2.27.1 VariantAnnotation_1.13.24
>> Rsamtools_1.19.26
>> [7] Biostrings_2.35.7 XVector_0.7.3
>> GenomicRanges_1.19.35
>> [10] GenomeInfoDb_1.3.12 IRanges_2.1.36
>> S4Vectors_0.5.17
>> [13] BiocGenerics_0.13.4 vimcom_1.0-0
>> setwidth_1.0-3
>> [16] colorout_1.0-3
>>
>> loaded via a namespace (and not attached):
>> [1] base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.8
>> BiocParallel_1.1.13 biomaRt_2.23.5
>> [6] bitops_1.0-6 brew_1.0-6 BSgenome_1.35.16
>> checkmate_1.5.1 codetools_0.2-10
>> [11] DBI_0.3.1 digest_0.6.8 fail_1.2
>> foreach_1.4.2 GenomicAlignments_1.3.27
>> [16] iterators_1.0.7 RCurl_1.95-4.5 RSQLite_1.0.0
>> rtracklayer_1.27.7 sendmailR_1.2-1
>> [21] stringr_0.6.2 tools_3.2.0 XML_3.98-1.1
>> zlibbioc_1.13.0
>>
>> cheers,
>>
>> robert.
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>

-- 
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550



More information about the Bioc-devel mailing list