[BioC] athPkgBuilder data source :missing probesets
Thomas Girke
thomas.girke at ucr.edu
Thu Aug 10 19:57:31 CEST 2006
Nianhua,
I suggest to use the probeset-to-gene mappings from TAIR, since they
are in charge of the annotation of this genome. This way one can be sure the
probeset-to-gene mappings align with new annotation releases of this
genome.
Also, I would consider to include the gene/locus-to-GO mappings from
TAIR. This data set is downloadable directly from GO.org:
http://geneontology.org/GO.current.annotations.shtml
http://www.geneontology.org/cgi-bin/downloadGOGA.pl/gene_association.tair.gz
Thanks for taking care of this.
Thomas
On Thu 08/10/06 10:25, Nianhua Li wrote:
> Dear Tine and Bj?rn,
>
> Thanks a lot for your detailed replies. I really appreciate them. I
> would like to summarize them to make sure we are on the same page:
>
> Now I understand that we should use AGI locus as gene identifier and it
> can be missing for some probesets. It also seems EntrezGene ID is
> unnecessary. I was actually more interested in the *source*. Whether
> should we use *Affymetrix's annotation*
> (https://www.affymetrix.com/support/technical/byproduct.affx?product=arab)
> or *TAIR's*
> (ftp://ftp.arabidopsis.org/home/tair/Microarrays/Affymetrix/affy_ATH1_array_elements-2006-07-14.txt)
> for probeset-to-gene mapping. You both prefer TAIR's, don't you? The
> current implementation (athPkgBuilder) is based on Affymetrix's.
>
> Thanks for the PubMed source
> (ftp://ftp.arabidopsis.org/home/tair/User_Requests/LocusPublished.08012006.txt).
> Should I make it the default in athPkgBuilder then?
>
> It is fairly easy to obtain KEGG annotation. File
> ftp://ftp.genome.jp/pub/kegg/genomes/ath/ath_tair.list maps AGI locus
> to KEGG Gene ID mapping. If you look at the file, the two identifiers
> always have the same value. And then file
> ftp://ftp.genome.jp/pub/kegg/pathways/ath/ath_gene_map.tab maps KEGG
> Gene ID to KEGG pathway ID. Finally file
> ftp://ftp.genome.jp/pub/kegg/pathways/map_title.tab maps KEGG pathway ID
> to pathway title. Another detail is that KEGG has two "genome code" for
> Arabidopsis: ath and eath. "ath" contains mappings between pathway and
> CDS (real genes), whereas "eath" maps pathway with ESTs. For example,
> "eath00051" and "ath00051" shows the same pathway graph, but links to
> CDS and EST respectively:
> http://www.genome.jp/dbget-bin/show_pathway?eath00051
> http://www.genome.jp/dbget-bin/show_pathway?ath00051
> Should we use "ath" or "eath"?
>
> Also it seems the gene description (ath1121501GENENAME) part should keep
> the current implementation (base on
> ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR_sequenced_genes ).
>
> thanks again
>
> nianhua
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Thomas Girke, Ph.D.
1008 Noel T. Keen Hall
Center for Plant Cell Biology (CEPCEB)
University of California
Riverside, CA 92521
E-mail: thomas.girke at ucr.edu
Website: http://faculty.ucr.edu/~tgirke
Ph: 951-827-2469
Fax: 951-827-4437
More information about the Bioconductor
mailing list