[BioC] about source to make HGU133plus2 package

Nianhua Li nli at fhcrc.org
Mon Apr 30 23:24:11 CEST 2007


Hi, Greg,

We first map probeset IDs to Entrez Gene IDs and then extract other
annotations
by using the Entrez Gene IDs. Here are some details for hgu133plus2
v1.16.0 (the
one in bioc2.0 release).

The ACCNUM environment (hgu133plus2ACCNUM) was extracted from Affymetrix's
annotation file (dated 11/15/2006, the "Representative Public ID"
column). It is
used to get probeset to Entrez Gene mapping via data from Entrez Gene
(GenBank
to Entrez mapping, dated 2/28/2007) and UniGene (GenBank to UniGene
mapping and
UniGene to Entrez mapping, dated 2/26/2007). For unmapped probeset
IDs, we use
the probeset to Entrez mapping in Affymetrix's annotation file as a
suppliment.
The result is the ENTREZID environment.

We then search for other annotations by using the Entrez IDs. The
table below
lists all the source data.

table columns:
---------------
Environment--name of the environment, e.g. CHRLOC represents
hgu133plus2CHRLOC
Source Name--name of the source database
Source URL--URL of the source data directory
Source Date--date of the source data

------------------------------------------------------
CHRLOC  UCSC Genome Bioinformatics (Homo sapiens)
ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Homo_sapiens

2006-Apr14
CHR     Entrez Gene     ftp://ftp.ncbi.nlm.nih.gov/gene/DATA    2007-Feb28
ENZYME  KEGG GENOME     ftp://ftp.genome.jp/pub/kegg/genomes    2007-Feb28
GENENAME        Entrez Gene     ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
  2007-Feb28
GO      Gene Ontology   http://archive.godatabase.org/latest    200702
MAP     Entrez Gene     ftp://ftp.ncbi.nlm.nih.gov/gene/DATA    2007-Feb28
OMIM    Entrez Gene     ftp://ftp.ncbi.nlm.nih.gov/gene/DATA    2007-Feb28
PATH    KEGG GENOME     ftp://ftp.genome.jp/pub/kegg/genomes    2007-Feb28
PMID    Entrez Gene     ftp://ftp.ncbi.nlm.nih.gov/gene/DATA    2007-Feb28
REFSEQ  Entrez Gene     ftp://ftp.ncbi.nlm.nih.gov/gene/DATA    2007-Feb28
SYMBOL  Entrez Gene     ftp://ftp.ncbi.nlm.nih.gov/gene/DATA    2007-Feb28
UNIGENE Entrez Gene     ftp://ftp.ncbi.nlm.nih.gov/gene/DATA    2007-Feb28
ENZYME2PROBE    KEGG GENOME     ftp://ftp.genome.jp/pub/kegg/genomes
  2007-Feb28
GO2PROBE        Entrez Gene     ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
  2007-Feb28
GO2ALLPROBES    Gene Ontology   http://archive.godatabase.org/latest
  200702
GO2ALLPROBES    Entrez Gene     ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
  2007-Feb28
PATH2PROBE      KEGG GENOME     ftp://ftp.genome.jp/pub/kegg/genomes
  2007-Feb28
PFAM    The International Protein Index
ftp://ftp.ebi.ac.uk/pub/databases/IPI/current   2007-Feb21
PROSITE The International Protein Index
ftp://ftp.ebi.ac.uk/pub/databases/IPI/current   2007-Feb21
PMID2PROBE      Entrez Gene     ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
  2007-Feb28
-------------------------------------------------------

hope this helps

nianhua



More information about the Bioconductor mailing list