[BioC] Complete variant toolbox: gmapR/VariantTools/VariantAnnotation - revived
Valerie Obenchain
vobencha at fhcrc.org
Sun Jul 13 07:26:01 CEST 2014
Hi,
Following up on this thread:
https://stat.ethz.ch/pipermail/bioconductor/2013-December/056745.html
These changes are available in VariantAnnotation 1.11.15:
1) LOCSTART, LOCEND
locateVariants() has 2 new output columns, LOCSTART and LOCEND.
These are LOCATION-centric coordinates and can be different for each row
so I thought these names were more descriptive than REFLOCS (discussed
in thread). We have 2 values (start/end) instead of a single column of
IRanges() because we can't make an IRanges() with missing values.
Technically 'missing' ranges are represented by zero-width ranges but we
still need a position; there is no position because there was no overlap.
2) mapCoords(), pmapCoords()
These functions are courtesy of Michael. mapCoords() maps ranges onto
another set of coordinates. You can map to cds-centric, exon-centric or
any other type of coordinate. See ?mapCoords in both GenomicRanges and
GenomicAlignments.
In the previous thread we discussed added cDNA locations to
predictCoding(). I've decided against this because it adds the
additional overhead of the exonsBy() extraction and a findOverlaps()
call. Not all users want the cDNA locations and those that do can now
easily get them with mapCoords().
## The usual predictCoding setup:
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
fl <- system.file("extdata", "chr22.vcf.gz",
package="VariantAnnotation")
vcf <- readVcf(fl, "hg19")
vcf <- renameSeqlevels(vcf, "chr22")
coding <- predictCoding(vcf, txdb, Hsapiens)
## Exon-centric or cDNA locations:
exonsbytx <- exonsBy(txdb, "tx")
cDNA <- mapCoords(coding[!duplicated(ranges(coding))], exonsbytx)
coding$cDNA <- ranges(cDNA)[togroup(coding$QUERYID)]
Let me know if you run into problems or if the docs need more detail.
Valerie
More information about the Bioconductor
mailing list