[BioC] VariantAnnotation and distance to genes
James W. MacDonald
jmacdon at uw.edu
Mon May 13 17:48:53 CEST 2013
Hi Gad,
On 5/12/2013 11:41 PM, Gad Abraham wrote:
> Hi,
>
> I'm using VariantAnnotation, which is a very nice package, to annotate some
> SNPs, based on
> http://adairama.wordpress.com/2013/02/15/functionally-annotate-snps-and-indels-in-bioconductor
> .
>
> For SNPs annotated as intergenic and therefore falling between two genes
> (PRECEDEID and FOLLOWID), is there a way of getting the actual distance
> between the SNPs and these two genes?
>
> I'm using:
>
> head(m, 3)
> RS Chr Pos
> 1 rs2187668 chr6 32713862
> 2 rs9357152 chr6 32772938
> 3 rs3099844 chr6 31556955
>
> target<- with(m,
> GRanges(seqnames=Rle(Chr),
> ranges=IRanges(Pos, end=Pos, names=RS),
> strand=Rle(strand("*"))
> )
> )
>
> loc<- locateVariants(target, TxDb.Hsapiens.UCSC.hg18.knownGene,
> AllVariants())
> names(loc)<- NULL
> out<- as.data.frame(loc)
>
> head(out, 3)
> names seqnames start end LOCATION GENEID PRECEDEID FOLLOWID
> GeneSymbol PrecedeSymbol FollowSymbol
> 1 rs2187668 chr6 32713862 32713862 intron 3117<NA>
> <NA> HLA-DQA1<NA> <NA>
> 4 rs9357152 chr6 32772938 32772938 intergenic<NA> 3119
> 3117<NA> HLA-DQB1 HLA-DQA1
> 5 rs3099844 chr6 31556955 31556955 intergenic<NA> 4277
> 352961<NA> MICB HCG26
>
> I'd like to get the distance from rs9357152 to HLA-DQB1 and to HLA-DQA1.
There might be a far superior way to do this, but one way is to assume
that cds == genes and go from there:
> gns <- cds(TxDb.Hsapiens.UCSC.hg19.knownGene,
columns=c("cds_id","gene_id"))
> ind <- sapply(values(gns)$gene_id, function(x) any(x %in%
c("3117","3119")))
> sum(ind)
[1] 28
> gns[ind,]
GRanges with 28 ranges and 2 metadata columns:
seqnames ranges strand | cds_id
gene_id
<Rle> <IRanges> <Rle> | <integer> <CharacterList>
[1] chr6 [32605236, 32605317] + |
73476 3117
[2] chr6 [32609087, 32609335] + |
73477 3117
[3] chr6 [32609749, 32610030] + |
73478 3117
[4] chr6 [32609749, 32610164] + |
73479 3117
[5] chr6 [32610387, 32610541] + |
73480 3117
... ... ... ... ...
... ...
[24] chr6_qbl_hap6 [3860872, 3860982] - |
233860 3119
[25] chr6_qbl_hap6 [3861407, 3861780] - |
233861 3119
[26] chr6_qbl_hap6 [3861499, 3861780] - |
233862 3119
[27] chr6_qbl_hap6 [3864670, 3864939] - |
233863 3119
[28] chr6_qbl_hap6 [3866378, 3866486] - |
233864 3119
Note that these two genes also map to the extra Chr6 haplotypes.
> z <- cbind(as.data.frame(gns[ind,]), distance(GRanges("chr6",
IRanges(start=32772938, end=32772938)), gns[ind,]))[,c(1,2,4,6,7,8)]
Warning message:
In .local(x, y, ...) :
The behavior of distance() has changed in Bioconductor 2.12. See
?distance for details.
> colnames(z) <- c(colnames(z)[1:5], "Distance")
> z
seqnames start width cds_id gene_id Distance
1 chr6 32605236 82 73476 3117 167620
2 chr6 32609087 249 73477 3117 163602
3 chr6 32609749 282 73478 3117 162907
4 chr6 32609749 416 73479 3117 162773
5 chr6 32610387 155 73480 3117 162396
6 chr6 32628013 14 78928 3119 144911
7 chr6 32628636 22 78929 3119 144280
8 chr6 32629124 111 78930 3119 143703
9 chr6 32629125 110 78931 3119 143703
10 chr6 32629744 282 78932 3119 142912
11 chr6 32629951 44 78933 3119 142943
12 chr6 32632575 244 78934 3119 140119
13 chr6 32632575 270 78935 3119 140093
14 chr6 32634276 109 78936 3119 138553
15 chr6_cox_hap2 4073284 14 227891 3119 NA
16 chr6_cox_hap2 4074344 185 227892 3119 NA
17 chr6_cox_hap2 4074418 111 227893 3119 NA
18 chr6_cox_hap2 4074953 374 227894 3119 NA
19 chr6_cox_hap2 4075045 282 227895 3119 NA
20 chr6_cox_hap2 4078216 270 227896 3119 NA
21 chr6_cox_hap2 4079924 109 227897 3119 NA
22 chr6_qbl_hap6 3859738 14 233858 3119 NA
23 chr6_qbl_hap6 3860798 185 233859 3119 NA
24 chr6_qbl_hap6 3860872 111 233860 3119 NA
25 chr6_qbl_hap6 3861407 374 233861 3119 NA
26 chr6_qbl_hap6 3861499 282 233862 3119 NA
27 chr6_qbl_hap6 3864670 270 233863 3119 NA
28 chr6_qbl_hap6 3866378 109 233864 3119 NA
Best,
Jim
>
> Thanks,
> Gad
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list