[Bioc-devel] concat problem with CharacterList in mcols of GRanges
Valerie Obenchain
vobencha at fhcrc.org
Mon Mar 17 21:28:20 CET 2014
This was my oversight. I didn't think of using a BStringSet for the
structural variants.
Taking a quick look at how this might work.
> fl <- system.file("extdata", "structural.vcf",
package="VariantAnnotation")
> vcf <- readVcf(fl, "hg19")
> alt(vcf)
CharacterList of length 7
[[1]] <DEL>
[[2]] C
[[3]] <DEL>
[[4]] <DEL:ME:ALU>
[[5]] <INS:ME:L1>
[[6]] <DUP>
[[7]] <DUP:TANDEM>
Looks like we need to add a List constructor:
> BStringSetList(alt(vcf))
Error: could not find function "BStringSetList"
But once we did the non-nucleotide characters are handled nicely:
> BStringSet(unlist(alt(vcf)))
A BStringSet instance of length 7
width seq
[1] 5 <DEL>
[2] 1 C
[3] 5 <DEL>
[4] 12 <DEL:ME:ALU>
[5] 11 <INS:ME:L1>
[6] 5 <DUP>
[7] 12 <DUP:TANDEM>
With a List constructor and the ability to combine BStringSet with other
XStringSet objects we would be set. Good suggestion. Thanks.
Val
On 03/17/2014 12:47 PM, Hervé Pagès wrote:
> Hi Vince,
>
> On 03/16/2014 06:11 PM, Vincent Carey wrote:
>> It seems that there is diversity in the classes assigned for ALT in
>> results
>> of readVcf, and there was some discussion of this in 1/2013.
>
> Was this discussion on the mailing list?. Can't find it.
>
> If using diverse/unpredictable classes for ALT cannot be avoided, have
> you considered using a BStringSetList instead of a CharacterList when
> the variant are "structural"?
>
> There is a big divide between DNAStringSetList and CharacterList in
> terms of internal representation. But not so much between
> DNAStringSetList and BStringSetList. So using BStringSetList instead
> of CharacterList would help smoothing out the kind of issues you're
> facing here. In particular, even though combining DNAStringSetList
> and BStringSetList objects doesn't work right now, that's something
> we should definitely support (it would be easy to add).
>
> Cheers,
> H.
>
>> So it looks
>> like this is predictable and solvable with some upstream work after the
>> read.
>>
>>
>> On Sun, Mar 16, 2014 at 7:43 PM, Vincent Carey
>> <stvjc at channing.harvard.edu>wrote:
>>
>>>> c(x[[1]][1:3,1:2], x[[3]][1:3,1:2]) # this works
>>> GRanges with 6 ranges and 2 metadata columns:
>>> seqnames ranges strand | paramRangeID REF
>>> <Rle> <IRanges> <Rle> | <factor> <DNAStringSet>
>>> [1] 1 [ 10583, 10583] * | dhs_chr1_10402 G
>>> [2] 1 [ 10611, 10611] * | dhs_chr1_10402 C
>>> [3] 1 [ 10583, 10583] * | dhs_chr1_10502 G
>>> [4] 1 [832178, 832178] * | dhs_chr1_833139 A
>>> [5] 1 [832266, 832266] * | dhs_chr1_833139 G
>>> [6] 1 [832297, 832299] * | dhs_chr1_833139 CTG
>>> ---
>>> seqlengths:
>>> 1
>>> NA
>>>> x[[1]][1:3,1:3]
>>> GRanges with 3 ranges and 3 metadata columns:
>>> seqnames ranges strand | paramRangeID REF
>>> <Rle> <IRanges> <Rle> | <factor> <DNAStringSet>
>>> [1] 1 [10583, 10583] * | dhs_chr1_10402 G
>>> [2] 1 [10611, 10611] * | dhs_chr1_10402 C
>>> [3] 1 [10583, 10583] * | dhs_chr1_10502 G
>>> ALT
>>> <CharacterList>
>>> [1] A
>>> [2] G
>>> [3] A
>>> ---
>>> seqlengths:
>>> 1
>>> NA
>>>> c(x[[1]][1:3,1:3], x[[3]][1:3,1:3]) # if i try to concatenate while
>>>> ALT
>>> is included
>>> Error in .Primitive("c")(<S4 object of class "CompressedCharacterList">,
>>> :
>>> all arguments in '...' must have an element class that extends
>>> that of
>>> the first argument
>>>
>>> Enter a frame number, or 0 to exit
>>>
>>> 1: c(x[[1]][1:3, 1:3], x[[3]][1:3, 1:3])
>>> 2: c(x[[1]][1:3, 1:3], x[[3]][1:3, 1:3])
>>> 3: .local(x, ..., recursive = recursive)
>>> 4: .unlist_list_of_GenomicRanges(args, ignore.mcols = ignore.mcols)
>>> 5: do.call(rbind, lapply(x, mcols, FALSE))
>>> 6: do.call(rbind, lapply(x, mcols, FALSE))
>>> 7: (function (..., deparse.level = 1)
>>> standardGeneric("rbind"))(<S4 object of
>>> 8: standardGeneric("rbind")
>>> 9: eval(.dotsCall, env)
>>> 10: eval(.dotsCall, env)
>>> 11: eval(expr, envir, enclos)
>>> 12: .Method(..., deparse.level = deparse.level)
>>> 13: lapply(seq_len(length(df)), function(i) {
>>> cols <- lapply(args, `[[`, cn[
>>> 14: lapply(seq_len(length(df)), function(i) {
>>> cols <- lapply(args, `[[`, cn[
>>> 15: FUN(1:3[[3]], ...)
>>> 16: do.call(c, unname(cols))
>>> 17: do.call(c, unname(cols))
>>> 18: .Primitive("c")(<S4 object of class "CompressedCharacterList">, <S4
>>> object
>>> 19: .Primitive("c")(<S4 object of class "CompressedCharacterList">, <S4
>>> object
>>>
>>>> sessionInfo()
>>> R Under development (unstable) (2014-03-15 r65199)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] C
>>>
>>> attached base packages:
>>> [1] parallel stats graphics grDevices datasets utils tools
>>> [8] methods base
>>>
>>> other attached packages:
>>> [1] Biostrings_2.31.14 XVector_0.3.7 GenomicRanges_1.15.39
>>> [4] GenomeInfoDb_0.99.19 IRanges_1.21.34 BiocGenerics_0.9.3
>>> [7] BatchJobs_1.2 BBmisc_1.5 weaver_1.29.1
>>> [10] codetools_0.2-8 digest_0.6.4 BiocInstaller_1.13.3
>>>
>>> loaded via a namespace (and not attached):
>>> [1] DBI_0.2-7 RSQLite_0.11.4 Rcpp_0.11.1 brew_1.0-6
>>> [5] fail_1.2 plyr_1.8.1 sendmailR_1.1-2 stats4_3.2.0
>>> [9] stringr_0.6.2
>>>
>>>
>>>
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
--
Valerie Obenchain
Program in Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B155
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: vobencha at fhcrc.org
Phone: (206) 667-3158
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list