[BioC] VariantAnnotation and distance to genes

James W. MacDonald jmacdon at uw.edu
Mon May 13 17:48:53 CEST 2013


Hi Gad,

On 5/12/2013 11:41 PM, Gad Abraham wrote:
> Hi,
>
> I'm using VariantAnnotation, which is a very nice package, to annotate some
> SNPs, based on
> http://adairama.wordpress.com/2013/02/15/functionally-annotate-snps-and-indels-in-bioconductor
> .
>
> For SNPs annotated as intergenic and therefore falling between two genes
> (PRECEDEID and FOLLOWID), is there a way of getting the actual distance
> between the SNPs and these two genes?
>
> I'm using:
>
> head(m, 3)
>           RS  Chr      Pos
> 1 rs2187668 chr6 32713862
> 2 rs9357152 chr6 32772938
> 3 rs3099844 chr6 31556955
>
> target<- with(m,
>        GRanges(seqnames=Rle(Chr),
>           ranges=IRanges(Pos, end=Pos, names=RS),
>           strand=Rle(strand("*"))
>        )
> )
>
> loc<- locateVariants(target, TxDb.Hsapiens.UCSC.hg18.knownGene,
> AllVariants())
> names(loc)<- NULL
> out<- as.data.frame(loc)
>
> head(out, 3)
>        names seqnames    start      end   LOCATION GENEID PRECEDEID FOLLOWID
> GeneSymbol PrecedeSymbol FollowSymbol
> 1 rs2187668     chr6 32713862 32713862     intron   3117<NA>
> <NA>    HLA-DQA1<NA>          <NA>
> 4 rs9357152     chr6 32772938 32772938 intergenic<NA>       3119
> 3117<NA>       HLA-DQB1     HLA-DQA1
> 5 rs3099844     chr6 31556955 31556955 intergenic<NA>       4277
> 352961<NA>           MICB        HCG26
>
> I'd like to get the distance from rs9357152 to HLA-DQB1 and to HLA-DQA1.

There might be a far superior way to do this, but one way is to assume 
that cds == genes and go from there:

 > gns <- cds(TxDb.Hsapiens.UCSC.hg19.knownGene, 
columns=c("cds_id","gene_id"))
 > ind <- sapply(values(gns)$gene_id, function(x) any(x %in% 
c("3117","3119")))
 > sum(ind)
[1] 28
 > gns[ind,]
GRanges with 28 ranges and 2 metadata columns:
             seqnames               ranges strand   |    cds_id         
gene_id
<Rle> <IRanges> <Rle>   | <integer> <CharacterList>
    [1]          chr6 [32605236, 32605317]      +   |     
73476            3117
    [2]          chr6 [32609087, 32609335]      +   |     
73477            3117
    [3]          chr6 [32609749, 32610030]      +   |     
73478            3117
    [4]          chr6 [32609749, 32610164]      +   |     
73479            3117
    [5]          chr6 [32610387, 32610541]      +   |     
73480            3117
    ...           ...                  ...    ... ...       
...             ...
   [24] chr6_qbl_hap6   [3860872, 3860982]      -   |    
233860            3119
   [25] chr6_qbl_hap6   [3861407, 3861780]      -   |    
233861            3119
   [26] chr6_qbl_hap6   [3861499, 3861780]      -   |    
233862            3119
   [27] chr6_qbl_hap6   [3864670, 3864939]      -   |    
233863            3119
   [28] chr6_qbl_hap6   [3866378, 3866486]      -   |    
233864            3119

Note that these two genes also map to the extra Chr6 haplotypes.

 > z <- cbind(as.data.frame(gns[ind,]), distance(GRanges("chr6", 
IRanges(start=32772938, end=32772938)), gns[ind,]))[,c(1,2,4,6,7,8)]
Warning message:
In .local(x, y, ...) :
   The behavior of distance() has changed in Bioconductor 2.12. See 
?distance for details.
 > colnames(z) <- c(colnames(z)[1:5], "Distance")
 > z
         seqnames    start width cds_id gene_id Distance
1           chr6 32605236    82  73476    3117   167620
2           chr6 32609087   249  73477    3117   163602
3           chr6 32609749   282  73478    3117   162907
4           chr6 32609749   416  73479    3117   162773
5           chr6 32610387   155  73480    3117   162396
6           chr6 32628013    14  78928    3119   144911
7           chr6 32628636    22  78929    3119   144280
8           chr6 32629124   111  78930    3119   143703
9           chr6 32629125   110  78931    3119   143703
10          chr6 32629744   282  78932    3119   142912
11          chr6 32629951    44  78933    3119   142943
12          chr6 32632575   244  78934    3119   140119
13          chr6 32632575   270  78935    3119   140093
14          chr6 32634276   109  78936    3119   138553
15 chr6_cox_hap2  4073284    14 227891    3119       NA
16 chr6_cox_hap2  4074344   185 227892    3119       NA
17 chr6_cox_hap2  4074418   111 227893    3119       NA
18 chr6_cox_hap2  4074953   374 227894    3119       NA
19 chr6_cox_hap2  4075045   282 227895    3119       NA
20 chr6_cox_hap2  4078216   270 227896    3119       NA
21 chr6_cox_hap2  4079924   109 227897    3119       NA
22 chr6_qbl_hap6  3859738    14 233858    3119       NA
23 chr6_qbl_hap6  3860798   185 233859    3119       NA
24 chr6_qbl_hap6  3860872   111 233860    3119       NA
25 chr6_qbl_hap6  3861407   374 233861    3119       NA
26 chr6_qbl_hap6  3861499   282 233862    3119       NA
27 chr6_qbl_hap6  3864670   270 233863    3119       NA
28 chr6_qbl_hap6  3866378   109 233864    3119       NA

Best,

Jim




>
> Thanks,
> Gad
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list