[BioC] seqlevels in VCF objects
Valerie Obenchain
vobencha at fhcrc.org
Tue Jan 15 03:25:06 CET 2013
On 01/14/13 15:40, Murat Tasan wrote:
> hi all - i've encountered an annoying problem, and i'd like to avoid
> read/writing the many GBs required for the blunt-force solution...
>
> the 1000 Genomes project provides a collection of VCF files providing
> the genotypes for all found variants.
> after reading in the VCF files (via vcf<- readVcf(...)), i have an
> VCF object, but the info(vcf) object reveals the chromosome names
> (i.e. 'seqlevels') are "1", "2", ..., "X", "Y".
> Bioconductor's TxDb.Hsapiens.UCSC.hg19.knownGene object, however, uses
> the UCSC standard prefix for chromosome names: "chr1", "chr2", etc.
>
> in trying to subset(...) or predictCoding(...) the VCF data against
> the genome objects (including BSgenome.Hsapiens.UCSC.hg19) this causes
> an obvious failure.
>
> i tried re-setting the seqlevels of the VCF 'info' object like so
> (thinking the seqnames factor just indexes back on the seqlevels as a
> key):
>
> seqlevels(vcf at info)<- sprintf("chr%s", seqlevels(vcf at info))
>
> but this doesn't seem to have any effect.
>
> any idea on how to make this bulk change of seqnames for data in VCF objects?
There are examples of how to do this in the vignette and on the ?VCF
and ?predictCoding man pages.
In ?VCF there is a section in the examples dedicated to "Renaming
seqlevels and subsetting".
Renaming all seqlevels in a VCF can be done with renameSeqlevels() or
seqlevels() and subsetting a VCF on
seqlevels can be done with keepSeqlevels().
> fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
> vcf <- readVcf(fl, "hg19")
> seqlevels(vcf)
[1] "20"
> seqlevels(vcf) <- "chr20"
> seqlevels(vcf)
[1] "chr20"
Valerie
>
> cheers,
>
> -m
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list