[Bioc-devel] arabidopsis annotations

Fri Aug 24 16:50:08 CEST 2007

Hi, Herve,

I feel this is more of a data source problem than a data value problem. The
reason that we have this inconsistency in ag and ath1121501 is because we
extract enzyme information from AraCyc rather than from KEGG. KEGG provides
EC numbers but AraCyc only provides enzyme names. I tried to suggest using KEGG
instead of AraCyc when I updated AthPkgBuilder last year, but only get half way
through: we added KEGG pathway annotation to the package but still keep AraCyc
pathway data (post link:
http://article.gmane.org/gmane.science.biology.informatics.conductor/9527/match=arabidopsis
). Maybe you can use a similar solution: add KEGG enzyme annotation and rename
AraCyc enzyme annotation into a different object. 

I would also like to suggest posting this question on bioc so that you get a
bigger audience group. 

hope this helps

nianhua

Quoting Herve Pages <hpages at fhcrc.org>:

> Hi Bioc-developpers,
> 
> In the process of migrating the arabidopsis annotations to the new
> sqlite-based
> infrastructure, we found a problem with the current ENZYME/ENZYME2PROBE
> maps.
> We'd like to know what you think (especially if you've been using these
> maps).
> 
> In the ag and ath1121501 packages the ENZYME/ENZYME2PROBE maps are linking
> probe ids
> to enzyme names, and not to EC numbers like in _all_ other chip-based
> packages.
> In addition the man pages for those maps are incorrect: they claim that those
> 2 maps
> are between manufacturer ids and EC numbers (not really a surprise in fact
> because
> AnnBuilder uses the same template as for any other packages to generate the
> ENZYME/ENZYME2PROBE man pages).
> 
> This is not a satisfying situation and we'd like to improve things a little
> bit for the upcoming ag.db and ath1121501.db packages. There are of course
> different
> ways we could address the problem:
> 
>   A. just fix the man pages:
>      - pro: easy and 100% compatible with the current (environment-based) ag
> and
>             ath1121501 packages
>      - con: for arabidopsis, the ENZYME/ENZYME2PROBE maps will remain
> different
>             from what they are in all other chip-based packages + people
> that
>             want the EC numbers still don't have them
> 
>   B. fix the ENZYME/ENZYME2PROBE maps so that they are consistent with all
>      other ENZYME/ENZYME2PROBE maps
>      - pro: consistency across all other chip-based packages
>      - con: enzyme names are gone so the user code using the
> ENZYME/ENZYME2PROBE maps
>             from ag and ath1121501 will need to be modified to work with
> ag.db and
>             ath1121501.db
> 
>   C. rename the ENZYME/ENZYME2PROBE maps -> ECNAME/ECNAME2PROBE and deprecate
> the
>      ENZYME/ENZYME2PROBE maps
>      - pro: use the standard deprecation procedure for a smooth transition
> period
>      - con: people that want the EC numbers right now still don't have them
> (they'll
>             need to wait BioC 2.2)
> 
>   D. fix the ENZYME/ENZYME2PROBE maps and add 2 new maps (e.g.
> ECNAME/ECNAME2PROBE)
>      for the mapping between probe ids and enzyme names
>      - pro: consistency and completeness
>      - con: the user code using the ENZYME/ENZYME2PROBE maps from ag and
> ath1121501
>             will need to use the ECNAME/ECNAME2PROBE maps instead (but here
> the
>             impact on the user is not as bad as with B since the data they
>             have been using so far is still available but under different
> names)
> 
>   E. anything else?
> 
> Thanks for your feedback!
> 
> H.
> 
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>