[BioC] rtracklayer and gene symbols
James W. MacDonald
jmacdon at med.umich.edu
Thu Jul 16 17:36:54 CEST 2009
Hi Christian,
Christian Ruckert wrote:
> Is there an elegant way to find the chromosome, start and end position
> to a given gene symbol via rtracklayer.
I don't know about using rtracklayer, but there are any number of ways
to get these data. If you want directly from UCSC, you can query their
MySQL server directly:
> library(RMySQL)
Loading required package: DBI
> con <- dbConnect("MySQL", user = "genome", host =
"genome-mysql.cse.ucsc.edu", dbname = "hg18")
> gns <- c("BRIP1","VEGFA","FANCB","TP53")
> sql <- paste("select name2, txStart, txEnd from refGene where name2
in ('",
+ paste(gns, collapse = "','"), "');", sep = "")
> dbGetQuery(con, sql)
name2 txStart txEnd
1 BRIP1 57114766 57295537
2 FANCB 14771449 14801105
3 FANCB 14771449 14801105
4 TP53 7512444 7531588
5 TP53 7512444 7531588
6 TP53 7512444 7519536
7 TP53 7512444 7519536
8 TP53 7512444 7519536
9 TP53 7512444 7531588
10 TP53 7512444 7531588
11 VEGFA 43845930 43862201
12 VEGFA 43845930 43862201
13 VEGFA 43845930 43862201
14 VEGFA 43845930 43862201
15 VEGFA 43845930 43862201
16 VEGFA 43845930 43862201
17 VEGFA 43845930 43862201
Or you could use the org.Hs.eg.db package supplied by BioC:
> library(org.Hs.eg.db)
> egs <- unlist(mget(gns, revmap(org.Hs.egSYMBOL)))
> egs
BRIP1 VEGFA FANCB TP53
"83990" "7422" "2187" "7157"
> starts <- unlist(mget(egs, org.Hs.egCHRLOC))
> ends <- unlist(mget(egs, org.Hs.egCHRLOCEND))
## two end locations for TP53, so double up the symbol
> data.frame(gns=gns[c(1:4,4)], starts, ends)
gns starts ends
1 BRIP1 -57114766 -57295537
2 VEGFA 43845930 43862201
3 FANCB -14771449 -14801105
4 TP53 -7512444 -7531588
5 TP53 -7512444 -7519536
Or you could use biomaRt:
> library(biomaRt)
> mart <- useMart("ensembl", "hsapiens_gene_ensembl")
Checking attributes ... ok
Checking filters ... ok
> getBM(c("hgnc_symbol","start_position","end_position"),
"hgnc_symbol", gns, mart)
hgnc_symbol start_position end_position
1 FANCB 14861529 14891184
2 TP53 7565257 7590863
3 VEGFA 43737948 43754224
4 BRIP1 59759985 59940755
Best,
Jim
>
> In the table browser on USCS website I can provide these information by
> pasting a list of identifiers, so the requested information must be
> somewhere in the tables.
>
> My found solution is kind of indirect by first getting a table of all
> UCSC names together with gene symbols, finding the corresponding UCSC
> names to my symbols and then searching these UCSC names in a table of
> all UCSC names with location.
>
> Thank you in advance,
> Christian
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
More information about the Bioconductor
mailing list