[Bioc-devel] concat problem with CharacterList in mcols of GRanges

Hervé Pagès hpages at fhcrc.org
Mon Mar 17 20:47:44 CET 2014


Hi Vince,

On 03/16/2014 06:11 PM, Vincent Carey wrote:
> It seems that there is diversity in the classes assigned for ALT in results
> of readVcf, and there was some discussion of this in 1/2013.

Was this discussion on the mailing list?. Can't find it.

If using diverse/unpredictable classes for ALT cannot be avoided, have
you considered using a BStringSetList instead of a CharacterList when
the variant are "structural"?

There is a big divide between DNAStringSetList and CharacterList in
terms of internal representation. But not so much between
DNAStringSetList and BStringSetList. So using BStringSetList instead
of CharacterList would help smoothing out the kind of issues you're
facing here. In particular, even though combining DNAStringSetList
and BStringSetList objects doesn't work right now, that's something
we should definitely support (it would be easy to add).

Cheers,
H.

>  So it looks
> like this is predictable and solvable with some upstream work after the
> read.
>
>
> On Sun, Mar 16, 2014 at 7:43 PM, Vincent Carey
> <stvjc at channing.harvard.edu>wrote:
>
>>> c(x[[1]][1:3,1:2], x[[3]][1:3,1:2])  # this works
>> GRanges with 6 ranges and 2 metadata columns:
>>        seqnames           ranges strand |    paramRangeID            REF
>>           <Rle>        <IRanges>  <Rle> |        <factor> <DNAStringSet>
>>    [1]        1 [ 10583,  10583]      * |  dhs_chr1_10402              G
>>    [2]        1 [ 10611,  10611]      * |  dhs_chr1_10402              C
>>    [3]        1 [ 10583,  10583]      * |  dhs_chr1_10502              G
>>    [4]        1 [832178, 832178]      * | dhs_chr1_833139              A
>>    [5]        1 [832266, 832266]      * | dhs_chr1_833139              G
>>    [6]        1 [832297, 832299]      * | dhs_chr1_833139            CTG
>>    ---
>>    seqlengths:
>>      1
>>     NA
>>> x[[1]][1:3,1:3]
>> GRanges with 3 ranges and 3 metadata columns:
>>        seqnames         ranges strand |   paramRangeID            REF
>>           <Rle>      <IRanges>  <Rle> |       <factor> <DNAStringSet>
>>    [1]        1 [10583, 10583]      * | dhs_chr1_10402              G
>>    [2]        1 [10611, 10611]      * | dhs_chr1_10402              C
>>    [3]        1 [10583, 10583]      * | dhs_chr1_10502              G
>>                    ALT
>>        <CharacterList>
>>    [1]               A
>>    [2]               G
>>    [3]               A
>>    ---
>>    seqlengths:
>>      1
>>     NA
>>> c(x[[1]][1:3,1:3], x[[3]][1:3,1:3])  # if i try to concatenate while ALT
>> is included
>> Error in .Primitive("c")(<S4 object of class "CompressedCharacterList">,
>>   :
>>    all arguments in '...' must have an element class that extends that of
>> the first argument
>>
>> Enter a frame number, or 0 to exit
>>
>>   1: c(x[[1]][1:3, 1:3], x[[3]][1:3, 1:3])
>>   2: c(x[[1]][1:3, 1:3], x[[3]][1:3, 1:3])
>>   3: .local(x, ..., recursive = recursive)
>>   4: .unlist_list_of_GenomicRanges(args, ignore.mcols = ignore.mcols)
>>   5: do.call(rbind, lapply(x, mcols, FALSE))
>>   6: do.call(rbind, lapply(x, mcols, FALSE))
>>   7: (function (..., deparse.level = 1)
>> standardGeneric("rbind"))(<S4 object of
>>   8: standardGeneric("rbind")
>>   9: eval(.dotsCall, env)
>> 10: eval(.dotsCall, env)
>> 11: eval(expr, envir, enclos)
>> 12: .Method(..., deparse.level = deparse.level)
>> 13: lapply(seq_len(length(df)), function(i) {
>>      cols <- lapply(args, `[[`, cn[
>> 14: lapply(seq_len(length(df)), function(i) {
>>      cols <- lapply(args, `[[`, cn[
>> 15: FUN(1:3[[3]], ...)
>> 16: do.call(c, unname(cols))
>> 17: do.call(c, unname(cols))
>> 18: .Primitive("c")(<S4 object of class "CompressedCharacterList">, <S4
>> object
>> 19: .Primitive("c")(<S4 object of class "CompressedCharacterList">, <S4
>> object
>>
>>> sessionInfo()
>> R Under development (unstable) (2014-03-15 r65199)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices datasets  utils     tools
>> [8] methods   base
>>
>> other attached packages:
>>   [1] Biostrings_2.31.14    XVector_0.3.7         GenomicRanges_1.15.39
>>   [4] GenomeInfoDb_0.99.19  IRanges_1.21.34       BiocGenerics_0.9.3
>>   [7] BatchJobs_1.2         BBmisc_1.5            weaver_1.29.1
>> [10] codetools_0.2-8       digest_0.6.4          BiocInstaller_1.13.3
>>
>> loaded via a namespace (and not attached):
>> [1] DBI_0.2-7       RSQLite_0.11.4  Rcpp_0.11.1     brew_1.0-6
>> [5] fail_1.2        plyr_1.8.1      sendmailR_1.1-2 stats4_3.2.0
>> [9] stringr_0.6.2
>>
>>
>>
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list