[BioC] VariantAnnotation and distance to genes
James W. MacDonald
jmacdon at uw.edu
Mon May 13 18:27:06 CEST 2013
On 5/13/2013 12:23 PM, Steve Lianoglou wrote:
> Hi,
>
> On Mon, May 13, 2013 at 8:48 AM, James W. MacDonald<jmacdon at uw.edu> wrote:
>> Hi Gad,
>>
>>
>> On 5/12/2013 11:41 PM, Gad Abraham wrote:
>>> Hi,
>>>
>>> I'm using VariantAnnotation, which is a very nice package, to annotate
>>> some
>>> SNPs, based on
>>>
>>> http://adairama.wordpress.com/2013/02/15/functionally-annotate-snps-and-indels-in-bioconductor
>>> .
>>>
>>> For SNPs annotated as intergenic and therefore falling between two genes
>>> (PRECEDEID and FOLLOWID), is there a way of getting the actual distance
>>> between the SNPs and these two genes?
>>>
>>> I'm using:
>>>
>>> head(m, 3)
>>> RS Chr Pos
>>> 1 rs2187668 chr6 32713862
>>> 2 rs9357152 chr6 32772938
>>> 3 rs3099844 chr6 31556955
>>>
>>> target<- with(m,
>>> GRanges(seqnames=Rle(Chr),
>>> ranges=IRanges(Pos, end=Pos, names=RS),
>>> strand=Rle(strand("*"))
>>> )
>>> )
>>>
>>> loc<- locateVariants(target, TxDb.Hsapiens.UCSC.hg18.knownGene,
>>> AllVariants())
>>> names(loc)<- NULL
>>> out<- as.data.frame(loc)
>>>
>>> head(out, 3)
>>> names seqnames start end LOCATION GENEID PRECEDEID
>>> FOLLOWID
>>> GeneSymbol PrecedeSymbol FollowSymbol
>>> 1 rs2187668 chr6 32713862 32713862 intron 3117<NA>
>>> <NA> HLA-DQA1<NA> <NA>
>>> 4 rs9357152 chr6 32772938 32772938 intergenic<NA> 3119
>>> 3117<NA> HLA-DQB1 HLA-DQA1
>>> 5 rs3099844 chr6 31556955 31556955 intergenic<NA> 4277
>>> 352961<NA> MICB HCG26
>>>
>>> I'd like to get the distance from rs9357152 to HLA-DQB1 and to HLA-DQA1.
>>
>> There might be a far superior way to do this, but one way is to assume that
>> cds == genes and go from there:
>>
>>> gns<- cds(TxDb.Hsapiens.UCSC.hg19.knownGene,
>>> columns=c("cds_id","gene_id"))
>>> ind<- sapply(values(gns)$gene_id, function(x) any(x %in%
>>> c("3117","3119")))
>>> sum(ind)
>> [1] 28
>>> gns[ind,]
>> GRanges with 28 ranges and 2 metadata columns:
>> seqnames ranges strand | cds_id
>> gene_id
>> <Rle> <IRanges> <Rle> |<integer> <CharacterList>
>> [1] chr6 [32605236, 32605317] + | 73476
>> 3117
>> [2] chr6 [32609087, 32609335] + | 73477
>> 3117
>> [3] chr6 [32609749, 32610030] + | 73478
>> 3117
>> [4] chr6 [32609749, 32610164] + | 73479
>> 3117
>> [5] chr6 [32610387, 32610541] + | 73480
>> 3117
>> ... ... ... ... ... ...
>> ...
>> [24] chr6_qbl_hap6 [3860872, 3860982] - | 233860
>> 3119
>> [25] chr6_qbl_hap6 [3861407, 3861780] - | 233861
>> 3119
>> [26] chr6_qbl_hap6 [3861499, 3861780] - | 233862
>> 3119
>> [27] chr6_qbl_hap6 [3864670, 3864939] - | 233863
>> 3119
>> [28] chr6_qbl_hap6 [3866378, 3866486] - | 233864
>> 3119
>>
>> Note that these two genes also map to the extra Chr6 haplotypes.
>>
>>> z<- cbind(as.data.frame(gns[ind,]), distance(GRanges("chr6",
>>> IRanges(start=32772938, end=32772938)), gns[ind,]))[,c(1,2,4,6,7,8)]
>> Warning message:
>> In .local(x, y, ...) :
>> The behavior of distance() has changed in Bioconductor 2.12. See ?distance
>> for details.
>>> colnames(z)<- c(colnames(z)[1:5], "Distance")
> Saving one line of code makes it twice as nice for half the price:
>
> R> z<- cbind(as.data.frame(gns[ind,]),
> Distance=distance(GRanges("chr6", IRanges(start=32772938,
> end=32772938)), gns[ind,]))[,c(1,2,4,6,7,8)]
>
> R> head(z)
> seqnames start width cds_id gene_id Distance
> 1| chr6 32605236 82 73476 3117 167620
> 2| chr6 32609087 249 73477 3117 163602
> 3| chr6 32609749 282 73478 3117 162907
> 4| chr6 32609749 416 73479 3117 162773
> 5| chr6 32610387 155 73480 3117 162396
> 6| chr6 32628013 14 78928 3119 144911
>
> This time, it's free ... this time.
Ugh. And you know these corporate types really mean it when they say
it's gonna cost you next time. I am preparing one arm and one leg for
removal and shipment, to speed payment when required. ;-D
Jim
>
> -steve
>
> --
> Steve Lianoglou
> Computational Biologist
> Department of Bioinformatics and Computational Biology
> Genentech
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list