[BioC] org.Mm.eg.db gives wrong symbol for MT genes
Gordon K Smyth
smyth at wehi.EDU.AU
Sun Aug 11 04:20:46 CEST 2013
Hi Vincent,
Thanks, that explains it. After reading your reply, I went to the NCBI
Gene FAQ and found the following explanation:
"NOTE: To the greatest extent possible, each protein-coding gene in
mitochondria has been assigned the same name (symbol) and full description
across species. In some instances, this is at variance with the symbol
assigned by species-specific nomenclature committees."
This would be fine except that (i) the NCBI Gene web interface disagrees
with the NCBI gene_info file and (ii) the nomenclature committee symbol
from MGI has not be included as a synonym in the gene_info file.
Anyway, the bottom line for my lab is that we will treat the
gene_info/org.Mm.eg.db symbols as official, and we will have to give the
MT genes special treatment when mapping aliases.
Regards
Gordon
On Sat, 10 Aug 2013, Vincent Carey wrote:
> Gordon, more definitive answers will likely come from the annotation core
> members, but here is what I understand
> about this. The mappings are completely dependent on NCBI content.
>
> Working with
>
> ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Mus_musculus.gene_info.gz
>
> the header is
>
> #Format: tax_id GeneID Symbol LocusTag Synonyms dbXrefs chromosome
> map_location description type_of_gene Symbol_from_nomenclature_authority
> Full_name_from_nomenclature_authority Nomenclature_status
> Other_designations Modification_date (tab is used as a separator, pound
> sign - start of a comment)
>
> and, with some context, the record for 17710 is
>
>> x[c(1,3516),]
> tax_id GeneID Symbol LocusTag Synonyms
> 1 10090 11287 Pzp - A1m|A2m|AI893533|MAM
> 3516 10090 17710 COX3 - -
> dbXrefs chromosome
> 1 MGI:87854|Ensembl:ENSMUSG00000030359|Vega:OTTMUSG00000022212 6
> 3516 MGI:102502 MT
> map_location description type_of_gene
> 1 6 F1-G3|6 63.02 cM pregnancy zone protein protein-coding
> 3516 - cytochrome c oxidase subunit III protein-coding
> Symbol_from_nomenclature_authority
> Full_name_from_nomenclature_authority
> 1 Pzp pregnancy zone
> protein
> 3516 mt-Co3 cytochrome c oxidase III,
> mitochondrial
> Nomenclature_status
> Other_designations
> 1 O alpha 1
> macroglobulin|alpha-2-M|alpha-2-macroglobulin
> 3516 O
> -
> Modification_date X
> 1 20130804 NA
> 3516 20130804 NA
>
> I would conjecture that the solution needs to come from NCBI -- they may
> have neglected to deal properly with the MT genes in this case, as the
> following computation suggests. The symbols for which field "Symbol" does
> not agree
> with field "Symbol_from_nomenclature_authority" are
>
>> xsn[xs!=xsn]
> [1] "mt-Atp6" "mt-Atp8" "mt-Co1" "mt-Co2" "mt-Co3" "mt-Cytb" "mt-Nd1"
> [8] "mt-Nd2" "mt-Nd3" "mt-Nd4" "mt-Nd4l" "mt-Nd5" "mt-Nd6" "mt-Rnr1"
> [15] "mt-Rnr2" "mt-Ta" "mt-Tc" "mt-Td" "mt-Te" "mt-Tf" "mt-Tg"
> [22] "mt-Th" "mt-Ti" "mt-Tk" "mt-Tl1" "mt-Tl2" "mt-Tm" "mt-Tn"
> [29] "mt-Tp" "mt-Tq" "mt-Tr" "mt-Ts1" "mt-Ts2" "mt-Tt" "mt-Tv"
> [36] "mt-Tw" "mt-Ty"
>
>
> On Fri, Aug 9, 2013 at 11:17 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Biocore,
>>
>> We make a strong effort to use current NCBI official gene symbols and
>> names in all our work, and we make much use of the excellent Bioconductor
>> packages org.Mm.eg.db and org.Hs.eg.db for this purpose.
>>
>> I have recently noticed that org.Mm.eg.db is giving incorrect official
>> names for mitochondrial genes. It is giving human symbols for these genes
>> instead of mouse symbols. For example
>>
>> > mappedRkeys(org.Mm.egSYMBOL["17710"])
>> [1] "COX3"
>>
>> According to both Entrez Gene
>>
>> http://www.ncbi.nlm.nih.gov/**gene/?term=17710<http://www.ncbi.nlm.nih.gov/gene/?term=17710>
>>
>> and MGI
>>
>> http://www.informatics.jax.**org/marker/MGI:102502<http://www.informatics.jax.org/marker/MGI:102502>
>>
>> the official symbol is "mt-Co3". This has been the official symbol for at
>> least 4 years and probably longer.
>>
>> The correct name is not even included as an Alias:
>>
>> > mappedRkeys(revmap(org.Mm.egALIAS2EG)["17710"])
>> [1] "COX3"
>>
>> COX3 is the actually the symbol for the human ortholog. It should only be
>> an alias for the mouse gene.
>>
>> Same for all the mitochondrial genes. In all cases, org.Mm.egSYMBOL is
>> giving the human symbol instead of the mouse symbol.
>>
>> Is this deliberate? If not, can you please fix?
>>
>> Thanks a lot
>> Gordon
>>
>> ---------------------------------------------
>> Professor Gordon K Smyth,
>> Bioinformatics Division,
>> Walter and Eliza Hall Institute of Medical Research,
>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>> http://www.statsci.org/smyth
>>
>>
>> sessionInfo()
>>>
>> R version 3.0.1 Patched (2013-07-04 r63183)
>> Platform: i386-w64-mingw32/i386 (32-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_Australia.**1252
>> [2] LC_CTYPE=English_Australia.**1252
>> [3] LC_MONETARY=English_Australia.**1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_Australia.1252
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets
>> [7] methods base
>>
>> other attached packages:
>> [1] org.Mm.eg.db_2.9.0 org.Hs.eg.db_2.9.0 RSQLite_0.11.4
>> [4] DBI_0.2-7 AnnotationDbi_1.22.6 Biobase_2.20.0
>> [7] BiocGenerics_0.6.0 limma_3.17.20
>>
>> loaded via a namespace (and not attached):
>> [1] IRanges_1.18.2 stats4_3.0.1
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list