[BioC] Gene location (Base pair number)
Hervé Pagès
hpages at fhcrc.org
Tue Jul 7 04:09:50 CEST 2009
Hi Tim,
Tim Smith wrote:
> Hi Martin,
>
> Thanks for that. I tried your code and got:
>
> --------------------------------------------
>> egid = revmap(org.Hs.egSYMBOL)[["WNT16"]]
>> org.Hs.egCHRLOC[[egid]]
>
> 7 7
> 120752656 120756325
>
>> org.Hs.egCHRLOCEND[[egid]]
>
> 7 7
> 120768394 120768394
> --------------------------------------------
>
> However, if I go to NCBI site (http://www.ncbi.nlm.nih.gov/sites/entrez) and search for 'WNT16', I get the following information for WNT16:
>
>
>
> Chromosome: 7;Location: 7q31
> Annotation: Chromosome 7, NC_000007.13 (120965421..120981158)
>
>
> Why is there a discrepancy between the values returned from bioconductor (UCSC?) and NCBI? Is there anything I can do that will get me a match with the NCBI location numbers?
>
This is because they use a different reference assembly:
- NCBI is now using the Genome Reference Consortium Human Build 37 (GRCh37),
- UCSC is still using hg18 (at UCSC, GRCh37 is called the hg19 assembly).
Unfortunately it's hard to figure out which assembly is used for the
org.Hs.egCHRLOC or org.Hs.egCHRLOCEND maps. The man page says:
Mappings were based on data provided by: UCSC Genome
Bioinformatics (Homo sapiens) (
ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Homo_sapiens
) on 2008-Sep3
and if you connect (by anonymous FTP) to hgdownload.cse.ucsc.edu,
you'll be able to see that the Homo_sapiens folder is actually a
symlink to hg18:
hpages at thinkpad:~$ ftp hgdownload.cse.ucsc.edu
Connected to hgdownload.cse.ucsc.edu.
220 FTP Server ready.
Name (hgdownload.cse.ucsc.edu:hpages): anonymous
331 Anonymous login ok, send your complete email address as your password
Password:
230 User anonymous logged in.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd goldenPath/currentGenomes
250 CWD command successful
ftp> ls
200 PORT command successful
150 Opening ASCII mode data connection for file list
dr-xr-xr-x 2 ftp ftp 4096 May 11 17:18 .
dr-xr-xr-x 128 ftp ftp 4096 Jun 17 00:03 ..
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Anolis_carolinensis ->
../anoCar1
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Anopheles_gambiae ->
../anoGam1
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Apis_mellifera ->
../apiMel2
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Bos_taurus -> ../bosTau4
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Branchiostoma_floridae
-> ../braFlo1
lr--r--r-- 1 ftp ftp 9 Sep 3 2008 Caenorhabditis_brenneri
-> ../caePb2
lr--r--r-- 1 ftp ftp 12 Sep 3 2008 Caenorhabditis_briggsae
-> ../cbJul2002
lr--r--r-- 1 ftp ftp 6 Sep 3 2008 Caenorhabditis_elegans
-> ../ce2
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Caenorhabditis_japonica
-> ../caeJap1
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Caenorhabditis_remanei
-> ../caeRem3
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Callithrix_jacchus ->
../calJac1
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Canis_familiaris ->
../canFam2
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Cavia_porcellus ->
../cavPor3
lr--r--r-- 1 ftp ftp 6 Sep 3 2008 Ciona_intestinalis ->
../ci2
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Danio_rerio -> ../danRer5
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Drosophila_ananassae ->
../droAna2
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Drosophila_erecta ->
../droEre1
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Drosophila_grimshawi ->
../droGri1
lr--r--r-- 1 ftp ftp 6 Sep 3 2008 Drosophila_melanogaster
-> ../dm3
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Drosophila_mojavensis
-> ../droMoj2
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Drosophila_persimilis
-> ../droPer1
lr--r--r-- 1 ftp ftp 6 Sep 3 2008
Drosophila_pseudoobscura -> ../dp3
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Drosophila_sechellia ->
../droSec1
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Drosophila_simulans ->
../droSim1
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Drosophila_virilis ->
../droVir2
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Drosophila_yakuba ->
../droYak2
lr--r--r-- 1 ftp ftp 10 Dec 4 2008 Equus_caballus ->
../equCab2
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Felis_catus -> ../felCat3
lr--r--r-- 1 ftp ftp 6 Sep 3 2008 Fugu_rubripes -> ../fr2
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Gallus_gallus -> ../galGal3
lr--r--r-- 1 ftp ftp 7 May 11 17:18 Homo_sapiens -> ../hg18
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Monodelphis_domestica
-> ../monDom4
lr--r--r-- 1 ftp ftp 6 Sep 3 2008 Mus_musculus -> ../mm9
lr--r--r-- 1 ftp ftp 10 Sep 3 2008
Ornithorhynchus_anatinus -> ../ornAna1
lr--r--r-- 1 ftp ftp 10 Nov 7 2008 Oryzias_latipes ->
../oryLat2
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Pan_troglodytes ->
../panTro2
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Petromyzon_marinus ->
../petMar1
lr--r--r-- 1 ftp ftp 6 Sep 3 2008 Rattus_norvegicus -> ../rn4
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Rhesus_macaque ->
../rheMac2
lr--r--r-- 1 ftp ftp 12 Sep 3 2008 SARS_coronavirus ->
../scApr2003
lr--r--r-- 1 ftp ftp 10 Sep 3 2008
Saccharomyces_cereviciae -> ../sacCer1
lr--r--r-- 1 ftp ftp 10 Sep 3 2008
Saccharomyces_cerevisiae -> ../sacCer1
lr--r--r-- 1 ftp ftp 10 Sep 3 2008
Strongylocentrotus_purpuratus -> ../strPur2
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Taeniopygia_guttata ->
../taeGut1
lr--r--r-- 1 ftp ftp 6 Sep 3 2008 Takifugu_rubripes -> ../fr2
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Tetraodon_nigroviridis
-> ../tetNig1
lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Xenopus_tropicalis ->
../xenTro1
226 Transfer complete
The problem is that this symlink could be changed at any time so
the information provided in the org.Hs.egCHRLOC man page will become
meaningless sooner or later...
Cheers,
H.
>
> thanks!
>
>
>
>
>
>
>
>
>
> Hi Tim --
>
> One suggestion is to use the org.Hs.eg.db package. The 'eg' means that
> the information is keyed off Entrez ids, so you need to map your SYMBOL
> to EG
>
> egid = revmap(org.Hs.egSYMBOL)[["WNT16"]]
>
> and then retrieve location information
>
> org.Hs.egCHRLOC[[egid]]
> org.Hs.egCHRLOCEND[[egid]]
>
> for many symbols, symids, one might
>
> egids = mappedLkeys(revmap(org.Hs.egSYMBOL)[symids])
> as.list(org.Hs.egCHRLOC[egids])
>
> etc. Some book-keeping might be needed to ensure correct symid -> egid
> -> CHRLOC mapping
>
> Martin
>
> Tim Smith wrote:
>
>> Hi,
>>
>> I wanted the exact base pair locations for several genes (e.g. wnt16 in the human wnt pathway). Which bioconductor package should I use?
>>
>> thanks!
>>
>>
>>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list