[Bioc-devel] concat problem with CharacterList in mcols of GRanges

Valerie Obenchain vobencha at fhcrc.org
Mon Mar 17 17:38:11 CET 2014


Hi Vince,

ALT is a CharacterList when the variants are 'structural', i.e., when 
the ALT field has non-nucleotide characters. Otherwise, ALT is a 
DNAStringSetList.

It looks like ALT is a CharacterList in x[[1]] and is probably a 
DNAStringSetList in x[[3]]. So you are trying to combine GRanges with 
these two different types in ALT:

gr1 <- GRanges("1", IRanges(1:3, width=1),
                ALT = CharacterList(list("A", "G", "A")))

gr2 <- GRanges("1", IRanges(5:7, width=1),
                ALT = DNAStringSetList(list("T", "A", "C")))

>> c(gr1, gr2)
> Error in .Primitive("c")(<S4 object of class "CompressedCharacterList">,  :
>   all arguments in '...' must have an element class that extends that of the first argument

The ALT values in the subset of x[[1]] are all nucleotides but not all 
ALT values in x[[1]] are which is why readVcf() made it a CharacterList. 
To combine these GRanges you need to either coerce all of x[[3]] ALT to 
CharacterList or coerce the subset of x[[1]] to DNAStringSetList.

Coerce the subset of x[[1]] to DNAStringSetList:

>> gr1$ALT <- DNAStringSetList(gr1$ALT)
>> gr1
> GRanges with 3 ranges and 1 metadata column:
>       seqnames    ranges strand |                ALT
>          <Rle> <IRanges>  <Rle> | <DNAStringSetList>
>   [1]        1    [1, 1]      * |                  A
>   [2]        1    [2, 2]      * |                  G
>   [3]        1    [3, 3]      * |                  A
>   ---

Combine:

>> c(gr1, gr2)
> GRanges with 6 ranges and 1 metadata column:
>       seqnames    ranges strand |                ALT
>          <Rle> <IRanges>  <Rle> | <DNAStringSetList>
>   [1]        1    [1, 1]      * |                  A
>   [2]        1    [2, 2]      * |                  G
>   [3]        1    [3, 3]      * |                  A
>   [4]        1    [5, 5]      * |                  T
>   [5]        1    [6, 6]      * |                  A
>   [6]        1    [7, 7]      * |                  C


Val


On 03/16/2014 06:11 PM, Vincent Carey wrote:
> It seems that there is diversity in the classes assigned for ALT in results
> of readVcf, and there was some discussion of this in 1/2013.  So it looks
> like this is predictable and solvable with some upstream work after the
> read.
>
>
> On Sun, Mar 16, 2014 at 7:43 PM, Vincent Carey
> <stvjc at channing.harvard.edu>wrote:
>
>>> c(x[[1]][1:3,1:2], x[[3]][1:3,1:2])  # this works
>> GRanges with 6 ranges and 2 metadata columns:
>>        seqnames           ranges strand |    paramRangeID            REF
>>           <Rle>        <IRanges>  <Rle> |        <factor> <DNAStringSet>
>>    [1]        1 [ 10583,  10583]      * |  dhs_chr1_10402              G
>>    [2]        1 [ 10611,  10611]      * |  dhs_chr1_10402              C
>>    [3]        1 [ 10583,  10583]      * |  dhs_chr1_10502              G
>>    [4]        1 [832178, 832178]      * | dhs_chr1_833139              A
>>    [5]        1 [832266, 832266]      * | dhs_chr1_833139              G
>>    [6]        1 [832297, 832299]      * | dhs_chr1_833139            CTG
>>    ---
>>    seqlengths:
>>      1
>>     NA
>>> x[[1]][1:3,1:3]
>> GRanges with 3 ranges and 3 metadata columns:
>>        seqnames         ranges strand |   paramRangeID            REF
>>           <Rle>      <IRanges>  <Rle> |       <factor> <DNAStringSet>
>>    [1]        1 [10583, 10583]      * | dhs_chr1_10402              G
>>    [2]        1 [10611, 10611]      * | dhs_chr1_10402              C
>>    [3]        1 [10583, 10583]      * | dhs_chr1_10502              G
>>                    ALT
>>        <CharacterList>
>>    [1]               A
>>    [2]               G
>>    [3]               A
>>    ---
>>    seqlengths:
>>      1
>>     NA
>>> c(x[[1]][1:3,1:3], x[[3]][1:3,1:3])  # if i try to concatenate while ALT
>> is included
>> Error in .Primitive("c")(<S4 object of class "CompressedCharacterList">,
>>   :
>>    all arguments in '...' must have an element class that extends that of
>> the first argument
>>
>> Enter a frame number, or 0 to exit
>>
>>   1: c(x[[1]][1:3, 1:3], x[[3]][1:3, 1:3])
>>   2: c(x[[1]][1:3, 1:3], x[[3]][1:3, 1:3])
>>   3: .local(x, ..., recursive = recursive)
>>   4: .unlist_list_of_GenomicRanges(args, ignore.mcols = ignore.mcols)
>>   5: do.call(rbind, lapply(x, mcols, FALSE))
>>   6: do.call(rbind, lapply(x, mcols, FALSE))
>>   7: (function (..., deparse.level = 1)
>> standardGeneric("rbind"))(<S4 object of
>>   8: standardGeneric("rbind")
>>   9: eval(.dotsCall, env)
>> 10: eval(.dotsCall, env)
>> 11: eval(expr, envir, enclos)
>> 12: .Method(..., deparse.level = deparse.level)
>> 13: lapply(seq_len(length(df)), function(i) {
>>      cols <- lapply(args, `[[`, cn[
>> 14: lapply(seq_len(length(df)), function(i) {
>>      cols <- lapply(args, `[[`, cn[
>> 15: FUN(1:3[[3]], ...)
>> 16: do.call(c, unname(cols))
>> 17: do.call(c, unname(cols))
>> 18: .Primitive("c")(<S4 object of class "CompressedCharacterList">, <S4
>> object
>> 19: .Primitive("c")(<S4 object of class "CompressedCharacterList">, <S4
>> object
>>
>>> sessionInfo()
>> R Under development (unstable) (2014-03-15 r65199)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices datasets  utils     tools
>> [8] methods   base
>>
>> other attached packages:
>>   [1] Biostrings_2.31.14    XVector_0.3.7         GenomicRanges_1.15.39
>>   [4] GenomeInfoDb_0.99.19  IRanges_1.21.34       BiocGenerics_0.9.3
>>   [7] BatchJobs_1.2         BBmisc_1.5            weaver_1.29.1
>> [10] codetools_0.2-8       digest_0.6.4          BiocInstaller_1.13.3
>>
>> loaded via a namespace (and not attached):
>> [1] DBI_0.2-7       RSQLite_0.11.4  Rcpp_0.11.1     brew_1.0-6
>> [5] fail_1.2        plyr_1.8.1      sendmailR_1.1-2 stats4_3.2.0
>> [9] stringr_0.6.2
>>
>>
>>
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Valerie Obenchain
Program in Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B155
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: vobencha at fhcrc.org
Phone:  (206) 667-3158
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list