[BioC] mouse4302

Nianhua Li nli at fhcrc.org
Thu Jun 15 00:07:35 CEST 2006


I received the following email from Lynn Amon and would like to answer 
it through the mailing list.

mouse4302 was generated by using function ABPkgBuilder in package 
AnnBuilder. The strategy is to first map probeset ids to Entrez Gene IDs 
and then use Entrez Gene IDs to retrieve other annotations (e.g. symbol, 
refseq, pathway, go). Because 1415822_at, 1415823_at and 1415824_at were 
all mapped to Entrez Gene ID 20249 which corresponds to Scd1, so all of 
their annotations (e.g. symbol, refseq) corresponds to Scd1.

So, the question goes to the mapping from probeset id to Entrez Gene ID. 
For mouse4302, we obtained the mapping in four ways:

(1) get probeset to GenBank accession mapping from Affymetrix 
annotation, and then use 
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz to map GenBank 
accession to EntrezGene ID
(2) get probeset to GenBank accession mapping from Affymetrix 
annotation, and then use 
ftp://ftp.ncbi.nih.gov/repository/UniGene/Mus_musculus/Mm.data.gz to map 
GenBank accession to EntrezGene ID
(3) get probeset to EntrezGene mapping directly from Affymetrix
(4) get probeset to UniGene mapping from Affymetrix and then use 
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz to map UniGene 
cluster to EntrezGene ID
* note: Affymetrix annotation is dated on Dec 18, 2005, and the rest is 
on March 18, 2006.

We treat the first two as "trust" sources, and the last two as 
supplimentary sources. So, the supplimentary sources won't be used 
unless all the "trust" sources have missing values for a probeset. No 
matter whether we use "trust" or "supplimentary" sources, if there is 
disagreement on the mapping of a probeset, we pick the value that is 
agreeed by most sources. If there is a tie, we will pick the first one 
on the list (i.e. arbitrarily). In the case of 1415822_at, we got 20249, 
20250, 20250, 20250 from the above four methods respectively. (BTW, 
1415822_at was mapped to GenBank acc BG060909 in Affymetrix's 
annotation). 20250 is the Entrez Gene record for Scd2, and 20249 is for 
Scd1. The value from "trusted" sources are 20249 and 20250. Because 
20249 happens to be the frist one on the list, we picked it up.

It seems the software picked the wrong value in this paticular example. 
But it might be a reasonal approach in general. I am not the expert. It 
will be appreciated if someone could comment on this.

many thanks

Nianhua Li
computational biology, public health, FHCRC

 > ---------- Forwarded message ----------
 > Date: Thu, 08 Jun 2006 07:13:33 -0700
 > From: Lynn Amon <lynnamon at u.washington.edu>
 > To: Ting-Yuan Liu <tliu at FHCRC.ORG>
 > Subject: Re: annotation services
 > Hello Ting,
 > I just loaded the newest version of mouse4302 from the Bioconductor 
1.8 and it
 > is different than the previous version.  By chance, I looked at the 
gene Scd1.
 > Previously, 1415965_at and 1415964_at were the only probe ids given 
for the gene
 > Scd1 which agrees with annotation given on the affy website and the 
 > view on Ensembl.  Now, in addition to those probes, 1415822_at, 
1415823_at and
 > 1415824_at which were formerly annotated as Scd2 are given the symbol 
and refseq
 > ID for Scd1 which does not agree with affy or Ensembl.  Is there a 
reason for
 > these changes?  Should I expect to see many changes in this new 
annotation file?
 > Shouldn't this annotation file agree with the annotations given by affy?
 > Thanks for you help,
 > Lynn Amon

More information about the Bioconductor mailing list