[BioC] Bug in subsetting vcf file by subject on more than 5 samples
Robert Castelo
robert.castelo at upf.edu
Tue Dec 17 12:28:51 CET 2013
hi,
when using the recently added feature of subsetting a vcf file by
subject, described initially here:
https://stat.ethz.ch/pipermail/bioconductor/2013-April/052146.html
i found out that it breaks when the samples to subset are more than 5.
here is the code reproducing the problem:
suppressPackageStartupMessages(library(VariantAnnotation))
fl <- system.file("extdata", "gl_chr1.vcf", package="VariantAnnotation")
hdr <- scanVcfHeader(fl)
length(samples(hdr))
[1] 85
so the above example file has 85 samples, let's try to subset on the
first 6 ones:
param <- ScanVcfParam(samples=samples(hdr)[1:6])
vcf <- readVcf(fl, "hg19", param)
Warning message:
In .vcf_map_samples(samples(hdr), samples) : samples not in file: ‘...’
next to this warning, the resulting 'CollapsedVCF' object has only 4
samples instead of 6:
dim(vcf)
[1] 9 4
a hint about what might be happening comes from the sample names in the
parameter object and the '...' referred to in the warning:
vcfSamples(param)
[1] "NA06984" "NA06986" "..." "NA07000" "NA07037"
this happens in both the release and devel versions, here's my
sessionInfo() for the release version:
R version 3.0.2 (2013-09-25)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
LC_TIME=en_US.UTF8
[4] LC_COLLATE=en_US.UTF8 LC_MONETARY=en_US.UTF8
LC_MESSAGES=en_US.UTF8
[7] LC_PAPER=en_US.UTF8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF8
LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
base
other attached packages:
[1] VariantAnnotation_1.8.8 Rsamtools_1.14.2 Biostrings_2.30.1
[4] GenomicRanges_1.14.4 XVector_0.2.0 IRanges_1.20.6
[7] BiocGenerics_0.8.0 vimcom_0.9-92 setwidth_1.0-3
[10] colorout_1.0-1
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.24.0 Biobase_2.22.0 biomaRt_2.18.0
bitops_1.0-6
[5] BSgenome_1.30.0 DBI_0.2-7
GenomicFeatures_1.14.2 RCurl_1.95-4.1
[9] RSQLite_0.11.4 rtracklayer_1.22.0 stats4_3.0.2
tools_3.0.2
[13] XML_3.98-1.1 zlibbioc_1.8.0
thanks!
robert.
--
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550
More information about the Bioconductor
mailing list