[BioC] Subsetting vcf file by subject

Valerie Obenchain vobencha at fhcrc.org
Mon Apr 15 19:53:16 CEST 2013


Hi Margaret,

This has been implemented in VariantAnnotation 1.7.4. You'll also need 
Rsamtools 1.13.3.

fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
param <- ScanVcfParam(samples="NA00002")
vcf <- readVcf(fl, "hg19", param=param)
 > geno(vcf)$GT
            NA00002
rs6054257  NA
20:17330   "1|0"
rs6040355  NA
20:1230237 NA
microsat1  "0|1"


Valerie


On 03/25/2013 08:42 AM, Valerie Obenchain wrote:
> Hi Margaret,
>
> Currently VariantAnnotation doesn't support subsetting a VCF file by
> subject. We are planning to implement this in the next devel cycle.
>
> Valerie
>
> On 03/25/2013 07:26 AM, Taub, Margaret wrote:
>> Hi all,
>>
>> I am interested in reading in only a subset of the subjects contained
>> in a large multi-sample vcf file. As far as I can see, there is a lot
>> of great functionality in VariantAnnotation for subsetting vcfs based
>> on genomic coordinates, annotation, etc. but I can't see anything for
>> subsetting samples, either in the current release or the devel
>> version. Any help would be greatly appreciated!
>>
>> Cheers,
>> Margaret
>>
>>
>>
>> Margaret Taub, PhD
>> Assistant Scientist
>> Department of Biostatistics
>> Johns Hopkins University
>> Bloomberg School of Public Health, E3546
>> 410-614-9408
>> mtaub at jhsph.edu<mailto:mtaub at jhsph.edu>
>>
>>> sessionInfo()
>> R version 2.15.2 Patched (2013-02-08 r61876)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8           LC_NUMERIC=C
>>   [3] LC_TIME=en_US.utf-8            LC_COLLATE=en_US.utf-8
>>   [5] LC_MONETARY=en_US.utf-8        LC_MESSAGES=en_US.utf-8
>>   [7] LC_PAPER=C                     LC_NAME=C
>>   [9] LC_ADDRESS=C                   LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices datasets  utils     methods   base
>>
>> other attached packages:
>> [1] VariantAnnotation_1.4.12 Rsamtools_1.10.2         Biostrings_2.26.3
>> [4] GenomicRanges_1.10.6     IRanges_1.16.4           BiocGenerics_0.4.0
>> [7] RColorBrewer_1.0-5
>>
>> loaded via a namespace (and not attached):
>>   [1] AnnotationDbi_1.20.3   Biobase_2.18.0         biomaRt_2.14.0
>>   [4] bitops_1.0-5           BSgenome_1.26.1        DBI_0.2-5
>>   [7] GenomicFeatures_1.10.1 parallel_2.15.2        RCurl_1.95-3
>> [10] RSQLite_0.11.2         rtracklayer_1.18.2     stats4_2.15.2
>> [13] tools_2.15.2           XML_3.95-0.1           zlibbioc_1.4.0
>>
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list