[BioC] VCF class: different length when unlisting INFO CompressedCharacterList

Francesco Lescai francesco.lescai at hum-gen.au.dk
Tue May 14 10:09:50 CEST 2013


Hi all and Hi Valerie (I suppose),
I was extracting a field of the INFO column from a VCF, but when I unlist it I get a different length compared the number of variants, so I don't know anymore which refers to each variant.

Here's what I'm doing

> vcf
class: VCF 
dim: 50273 30 
genome: hg19 
exptData(1): header
fixed(4): REF ALT QUAL FILTER
info(28): AC AF ... culprit set
geno(5): AD DP GQ GT PL
rownames(50273): 
[.. cut for clarity ..]

genotypes<-as.data.frame(geno(vcf)$GT)
dim(genotypes)
[1] 50273    30

list.va<-info(vcf)$VA
> length(info(vcf)$VA)
[1] 50273

> list.va
CompressedCharacterList of length 50273

info.va<-unlist(info(vcf)$VA)
> length(info.va)
[1] 53391

This is an annotation from Variant Annotation Tool, which modifies the VCF Info.
But if I do the same for other more "standard" fields, some of them have the same length of the variants, others don't when unlisted

> length(unlist(info(vcf)$HaplotypeScore))
[1] 50273
> length(unlist(info(vcf)$AC))
[1] 50489
> length(unlist(info(vcf)$AF))
[1] 50489

am I doing something wrong? or is the unlist method on the CompressedCharacterList splitting on some field delimiter?

below my sessionInfo.
thanks for any help you might provide,
cheers,
Francesco


> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reshape_0.8.4            plyr_1.8                 ggbio_1.6.6              ggplot2_0.9.3.1          VariantAnnotation_1.4.12 Rsamtools_1.10.2        
 [7] Biostrings_2.26.3        GenomicRanges_1.10.7     IRanges_1.16.6           BiocGenerics_0.4.0      

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.20.7   Biobase_2.18.0         biomaRt_2.14.0         biovizBase_1.6.2       bitops_1.0-4.2         BSgenome_1.26.1        cluster_1.14.4        
 [8] colorspace_1.2-1       DBI_0.2-5              dichromat_2.0-0        digest_0.6.3           GenomicFeatures_1.10.2 grid_2.15.1            gridExtra_0.9.1       
[15] gtable_0.1.2           Hmisc_3.10-1           labeling_0.1           lattice_0.20-15        MASS_7.3-23            munsell_0.4            parallel_2.15.1       
[22] proto_0.3-10           RColorBrewer_1.0-5     RCurl_1.95-4.1         reshape2_1.2.2         RSQLite_0.11.2         rtracklayer_1.18.2     scales_0.2.3          
[29] stats4_2.15.1          stringr_0.6.2          tools_2.15.1           XML_3.96-1.1           zlibbioc_1.4.0        



More information about the Bioconductor mailing list