[BioC] VCF class: different length when unlisting INFO CompressedCharacterList
Francesco Lescai
francesco.lescai at hum-gen.au.dk
Tue May 14 10:09:50 CEST 2013
Hi all and Hi Valerie (I suppose),
I was extracting a field of the INFO column from a VCF, but when I unlist it I get a different length compared the number of variants, so I don't know anymore which refers to each variant.
Here's what I'm doing
> vcf
class: VCF
dim: 50273 30
genome: hg19
exptData(1): header
fixed(4): REF ALT QUAL FILTER
info(28): AC AF ... culprit set
geno(5): AD DP GQ GT PL
rownames(50273):
[.. cut for clarity ..]
genotypes<-as.data.frame(geno(vcf)$GT)
dim(genotypes)
[1] 50273 30
list.va<-info(vcf)$VA
> length(info(vcf)$VA)
[1] 50273
> list.va
CompressedCharacterList of length 50273
info.va<-unlist(info(vcf)$VA)
> length(info.va)
[1] 53391
This is an annotation from Variant Annotation Tool, which modifies the VCF Info.
But if I do the same for other more "standard" fields, some of them have the same length of the variants, others don't when unlisted
> length(unlist(info(vcf)$HaplotypeScore))
[1] 50273
> length(unlist(info(vcf)$AC))
[1] 50489
> length(unlist(info(vcf)$AF))
[1] 50489
am I doing something wrong? or is the unlist method on the CompressedCharacterList splitting on some field delimiter?
below my sessionInfo.
thanks for any help you might provide,
cheers,
Francesco
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reshape_0.8.4 plyr_1.8 ggbio_1.6.6 ggplot2_0.9.3.1 VariantAnnotation_1.4.12 Rsamtools_1.10.2
[7] Biostrings_2.26.3 GenomicRanges_1.10.7 IRanges_1.16.6 BiocGenerics_0.4.0
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.20.7 Biobase_2.18.0 biomaRt_2.14.0 biovizBase_1.6.2 bitops_1.0-4.2 BSgenome_1.26.1 cluster_1.14.4
[8] colorspace_1.2-1 DBI_0.2-5 dichromat_2.0-0 digest_0.6.3 GenomicFeatures_1.10.2 grid_2.15.1 gridExtra_0.9.1
[15] gtable_0.1.2 Hmisc_3.10-1 labeling_0.1 lattice_0.20-15 MASS_7.3-23 munsell_0.4 parallel_2.15.1
[22] proto_0.3-10 RColorBrewer_1.0-5 RCurl_1.95-4.1 reshape2_1.2.2 RSQLite_0.11.2 rtracklayer_1.18.2 scales_0.2.3
[29] stats4_2.15.1 stringr_0.6.2 tools_2.15.1 XML_3.96-1.1 zlibbioc_1.4.0
More information about the Bioconductor
mailing list