nli at fhcrc.org
Thu Jun 15 00:07:35 CEST 2006
I received the following email from Lynn Amon and would like to answer
it through the mailing list.
mouse4302 was generated by using function ABPkgBuilder in package
AnnBuilder. The strategy is to first map probeset ids to Entrez Gene IDs
and then use Entrez Gene IDs to retrieve other annotations (e.g. symbol,
refseq, pathway, go). Because 1415822_at, 1415823_at and 1415824_at were
all mapped to Entrez Gene ID 20249 which corresponds to Scd1, so all of
their annotations (e.g. symbol, refseq) corresponds to Scd1.
So, the question goes to the mapping from probeset id to Entrez Gene ID.
For mouse4302, we obtained the mapping in four ways:
(1) get probeset to GenBank accession mapping from Affymetrix
annotation, and then use
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz to map GenBank
accession to EntrezGene ID
(2) get probeset to GenBank accession mapping from Affymetrix
annotation, and then use
ftp://ftp.ncbi.nih.gov/repository/UniGene/Mus_musculus/Mm.data.gz to map
GenBank accession to EntrezGene ID
(3) get probeset to EntrezGene mapping directly from Affymetrix
(4) get probeset to UniGene mapping from Affymetrix and then use
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz to map UniGene
cluster to EntrezGene ID
* note: Affymetrix annotation is dated on Dec 18, 2005, and the rest is
on March 18, 2006.
We treat the first two as "trust" sources, and the last two as
supplimentary sources. So, the supplimentary sources won't be used
unless all the "trust" sources have missing values for a probeset. No
matter whether we use "trust" or "supplimentary" sources, if there is
disagreement on the mapping of a probeset, we pick the value that is
agreeed by most sources. If there is a tie, we will pick the first one
on the list (i.e. arbitrarily). In the case of 1415822_at, we got 20249,
20250, 20250, 20250 from the above four methods respectively. (BTW,
1415822_at was mapped to GenBank acc BG060909 in Affymetrix's
annotation). 20250 is the Entrez Gene record for Scd2, and 20249 is for
Scd1. The value from "trusted" sources are 20249 and 20250. Because
20249 happens to be the frist one on the list, we picked it up.
It seems the software picked the wrong value in this paticular example.
But it might be a reasonal approach in general. I am not the expert. It
will be appreciated if someone could comment on this.
computational biology, public health, FHCRC
> ---------- Forwarded message ----------
> Date: Thu, 08 Jun 2006 07:13:33 -0700
> From: Lynn Amon <lynnamon at u.washington.edu>
> To: Ting-Yuan Liu <tliu at FHCRC.ORG>
> Subject: Re: annotation services
> Hello Ting,
> I just loaded the newest version of mouse4302 from the Bioconductor
1.8 and it
> is different than the previous version. By chance, I looked at the
> Previously, 1415965_at and 1415964_at were the only probe ids given
for the gene
> Scd1 which agrees with annotation given on the affy website and the
> view on Ensembl. Now, in addition to those probes, 1415822_at,
> 1415824_at which were formerly annotated as Scd2 are given the symbol
> ID for Scd1 which does not agree with affy or Ensembl. Is there a
> these changes? Should I expect to see many changes in this new
> Shouldn't this annotation file agree with the annotations given by affy?
> Thanks for you help,
> Lynn Amon
More information about the Bioconductor