[BioC] VariantAnnotation and distance to genes
Steve Lianoglou
lianoglou.steve at gene.com
Mon May 13 18:23:14 CEST 2013
Hi,
On Mon, May 13, 2013 at 8:48 AM, James W. MacDonald <jmacdon at uw.edu> wrote:
> Hi Gad,
>
>
> On 5/12/2013 11:41 PM, Gad Abraham wrote:
>>
>> Hi,
>>
>> I'm using VariantAnnotation, which is a very nice package, to annotate
>> some
>> SNPs, based on
>>
>> http://adairama.wordpress.com/2013/02/15/functionally-annotate-snps-and-indels-in-bioconductor
>> .
>>
>> For SNPs annotated as intergenic and therefore falling between two genes
>> (PRECEDEID and FOLLOWID), is there a way of getting the actual distance
>> between the SNPs and these two genes?
>>
>> I'm using:
>>
>> head(m, 3)
>> RS Chr Pos
>> 1 rs2187668 chr6 32713862
>> 2 rs9357152 chr6 32772938
>> 3 rs3099844 chr6 31556955
>>
>> target<- with(m,
>> GRanges(seqnames=Rle(Chr),
>> ranges=IRanges(Pos, end=Pos, names=RS),
>> strand=Rle(strand("*"))
>> )
>> )
>>
>> loc<- locateVariants(target, TxDb.Hsapiens.UCSC.hg18.knownGene,
>> AllVariants())
>> names(loc)<- NULL
>> out<- as.data.frame(loc)
>>
>> head(out, 3)
>> names seqnames start end LOCATION GENEID PRECEDEID
>> FOLLOWID
>> GeneSymbol PrecedeSymbol FollowSymbol
>> 1 rs2187668 chr6 32713862 32713862 intron 3117<NA>
>> <NA> HLA-DQA1<NA> <NA>
>> 4 rs9357152 chr6 32772938 32772938 intergenic<NA> 3119
>> 3117<NA> HLA-DQB1 HLA-DQA1
>> 5 rs3099844 chr6 31556955 31556955 intergenic<NA> 4277
>> 352961<NA> MICB HCG26
>>
>> I'd like to get the distance from rs9357152 to HLA-DQB1 and to HLA-DQA1.
>
>
> There might be a far superior way to do this, but one way is to assume that
> cds == genes and go from there:
>
>> gns <- cds(TxDb.Hsapiens.UCSC.hg19.knownGene,
>> columns=c("cds_id","gene_id"))
>> ind <- sapply(values(gns)$gene_id, function(x) any(x %in%
>> c("3117","3119")))
>> sum(ind)
> [1] 28
>> gns[ind,]
> GRanges with 28 ranges and 2 metadata columns:
> seqnames ranges strand | cds_id
> gene_id
> <Rle> <IRanges> <Rle> | <integer> <CharacterList>
> [1] chr6 [32605236, 32605317] + | 73476
> 3117
> [2] chr6 [32609087, 32609335] + | 73477
> 3117
> [3] chr6 [32609749, 32610030] + | 73478
> 3117
> [4] chr6 [32609749, 32610164] + | 73479
> 3117
> [5] chr6 [32610387, 32610541] + | 73480
> 3117
> ... ... ... ... ... ...
> ...
> [24] chr6_qbl_hap6 [3860872, 3860982] - | 233860
> 3119
> [25] chr6_qbl_hap6 [3861407, 3861780] - | 233861
> 3119
> [26] chr6_qbl_hap6 [3861499, 3861780] - | 233862
> 3119
> [27] chr6_qbl_hap6 [3864670, 3864939] - | 233863
> 3119
> [28] chr6_qbl_hap6 [3866378, 3866486] - | 233864
> 3119
>
> Note that these two genes also map to the extra Chr6 haplotypes.
>
>> z <- cbind(as.data.frame(gns[ind,]), distance(GRanges("chr6",
>> IRanges(start=32772938, end=32772938)), gns[ind,]))[,c(1,2,4,6,7,8)]
> Warning message:
> In .local(x, y, ...) :
> The behavior of distance() has changed in Bioconductor 2.12. See ?distance
> for details.
>> colnames(z) <- c(colnames(z)[1:5], "Distance")
Saving one line of code makes it twice as nice for half the price:
R> z <- cbind(as.data.frame(gns[ind,]),
Distance=distance(GRanges("chr6", IRanges(start=32772938,
end=32772938)), gns[ind,]))[,c(1,2,4,6,7,8)]
R> head(z)
seqnames start width cds_id gene_id Distance
1| chr6 32605236 82 73476 3117 167620
2| chr6 32609087 249 73477 3117 163602
3| chr6 32609749 282 73478 3117 162907
4| chr6 32609749 416 73479 3117 162773
5| chr6 32610387 155 73480 3117 162396
6| chr6 32628013 14 78928 3119 144911
This time, it's free ... this time.
-steve
--
Steve Lianoglou
Computational Biologist
Department of Bioinformatics and Computational Biology
Genentech
More information about the Bioconductor
mailing list