[BioC] Subsetting "sites only" VCF objects
Valerie Obenchain
vobencha at fhcrc.org
Tue Aug 28 18:21:04 CEST 2012
Hi Richard,
Thanks for reporting this bug. The problem was that the VCF class could
not be subset by row when the object had no columns. This has been fixed
for both SummarizedExperiment and the VCF class.
fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
vcf <- readVcf(fl, "hg19")
> dim(vcf)
[1] 5 3
## remove 'geno' and 'colData' to give the object 0 columns
geno(vcf) <- SimpleList()
colData(vcf) <- DataFrame()
> dim(vcf)
[1] 5 0
> vcf[1:3, ]
class: VCF
dim: 3 0
genome: hg19
exptData(1): header
fixed(4): REF ALT QUAL FILTER
info(6): NS DP ... DB H2
geno(0):
rownames(3): rs6054257 20:17330 rs6040355
rowData values names(1): paramRangeID
colnames: NULL
colData names(0):
Fixes have been made in the following versions,
Release :
GenomicRanges 1.8.13
VariantAnnotation 1.2.11
Devel :
GenomicRanges 1.9.56
VariantAnnotation 1.3.24
Valerie
On 08/24/2012 04:10 AM, Richard Pearson wrote:
> Hi
>
> It is wonderful that we can create subsets of VariantAnnotation VCF
> objects using [ but I have found that this doesn't work for VCFs that
> are "sites only", i.e. have no information in geno(vcf):
> > geno(vcf)
> SimpleList of length 0
> > passVcf <- vcf[values(fixed(vcf))[["FILTER"]] == "PASS", ]
> Error in colData(x)[j, , drop = FALSE] :
> selecting rows: subscript out of bounds
>
> In these cases I can create subsets, e.g. using:
> passVcf <- VCF(
> rowData = rowData(vcf)[values(fixed(vcf))[["FILTER"]] == "PASS"],
> colData = colData(vcf),
> exptData = exptData(vcf),
> fixed = values(fixed(vcf))[values(fixed(vcf))[["FILTER"]] ==
> "PASS", -(1)],
> info = values(info(vcf))[values(fixed(vcf))[["FILTER"]] == "PASS",
> -(1)]
> )
>
> But it would be great if I could also do this using [. Any chance this
> functionality could be included in a future version of VariantAnnotation?
>
> Thanks
>
> Richard
>
> > sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
> LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
> LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
> LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] VEDA_0.0.1 ggplot2_0.9.1 VariantAnnotation_1.2.10
> Rsamtools_1.8.6 Biostrings_2.24.1 GenomicRanges_1.8.12
> IRanges_1.14.4 BiocGenerics_0.2.0 malariagen_0.0.1
>
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.18.1 Biobase_2.16.0 biomaRt_2.12.0
> bitops_1.0-4.1 BSgenome_1.24.0 colorspace_1.1-1
> DBI_0.2-5 dichromat_1.2-4 digest_0.5.2
> GenomicFeatures_1.8.2
> [11] grid_2.15.0 labeling_0.1 lattice_0.20-6
> MASS_7.3-20 Matrix_1.0-6 memoise_0.1
> munsell_0.3 plyr_1.7.1 proto_0.3-9.2 RColorBrewer_1.0-5
> [21] RCurl_1.91-1 reshape2_1.2.1 RSQLite_0.11.1
> rtracklayer_1.16.3 scales_0.2.1 snpStats_1.6.0
> splines_2.15.0 stats4_2.15.0 stringr_0.6.1
> survival_2.36-14
> [31] tools_2.15.0 XML_3.9-4 zlibbioc_1.2.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list