[BioC] athPkgBuilder data source :missing probesets
Nianhua Li
nli at fhcrc.org
Thu Aug 10 19:25:20 CEST 2006
Dear Tine and Björn,
Thanks a lot for your detailed replies. I really appreciate them. I
would like to summarize them to make sure we are on the same page:
Now I understand that we should use AGI locus as gene identifier and it
can be missing for some probesets. It also seems EntrezGene ID is
unnecessary. I was actually more interested in the *source*. Whether
should we use *Affymetrix's annotation*
(https://www.affymetrix.com/support/technical/byproduct.affx?product=arab)
or *TAIR's*
(ftp://ftp.arabidopsis.org/home/tair/Microarrays/Affymetrix/affy_ATH1_array_elements-2006-07-14.txt)
for probeset-to-gene mapping. You both prefer TAIR's, don't you? The
current implementation (athPkgBuilder) is based on Affymetrix's.
Thanks for the PubMed source
(ftp://ftp.arabidopsis.org/home/tair/User_Requests/LocusPublished.08012006.txt).
Should I make it the default in athPkgBuilder then?
It is fairly easy to obtain KEGG annotation. File
ftp://ftp.genome.jp/pub/kegg/genomes/ath/ath_tair.list maps AGI locus
to KEGG Gene ID mapping. If you look at the file, the two identifiers
always have the same value. And then file
ftp://ftp.genome.jp/pub/kegg/pathways/ath/ath_gene_map.tab maps KEGG
Gene ID to KEGG pathway ID. Finally file
ftp://ftp.genome.jp/pub/kegg/pathways/map_title.tab maps KEGG pathway ID
to pathway title. Another detail is that KEGG has two "genome code" for
Arabidopsis: ath and eath. "ath" contains mappings between pathway and
CDS (real genes), whereas "eath" maps pathway with ESTs. For example,
"eath00051" and "ath00051" shows the same pathway graph, but links to
CDS and EST respectively:
http://www.genome.jp/dbget-bin/show_pathway?eath00051
http://www.genome.jp/dbget-bin/show_pathway?ath00051
Should we use "ath" or "eath"?
Also it seems the gene description (ath1121501GENENAME) part should keep
the current implementation (base on
ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR_sequenced_genes ).
thanks again
nianhua
More information about the Bioconductor
mailing list