[Bioc-devel] arabidopsis annotations

Thu Aug 23 19:09:24 CEST 2007

Hi Bioc-developpers,

In the process of migrating the arabidopsis annotations to the new sqlite-based
infrastructure, we found a problem with the current ENZYME/ENZYME2PROBE maps.
We'd like to know what you think (especially if you've been using these maps).

In the ag and ath1121501 packages the ENZYME/ENZYME2PROBE maps are linking probe ids
to enzyme names, and not to EC numbers like in _all_ other chip-based packages.
In addition the man pages for those maps are incorrect: they claim that those 2 maps
are between manufacturer ids and EC numbers (not really a surprise in fact because
AnnBuilder uses the same template as for any other packages to generate the
ENZYME/ENZYME2PROBE man pages).

This is not a satisfying situation and we'd like to improve things a little
bit for the upcoming ag.db and ath1121501.db packages. There are of course different
ways we could address the problem:

  A. just fix the man pages:
     - pro: easy and 100% compatible with the current (environment-based) ag and
            ath1121501 packages
     - con: for arabidopsis, the ENZYME/ENZYME2PROBE maps will remain different
            from what they are in all other chip-based packages + people that
            want the EC numbers still don't have them

  B. fix the ENZYME/ENZYME2PROBE maps so that they are consistent with all
     other ENZYME/ENZYME2PROBE maps
     - pro: consistency across all other chip-based packages
     - con: enzyme names are gone so the user code using the ENZYME/ENZYME2PROBE maps
            from ag and ath1121501 will need to be modified to work with ag.db and
            ath1121501.db

  C. rename the ENZYME/ENZYME2PROBE maps -> ECNAME/ECNAME2PROBE and deprecate the
     ENZYME/ENZYME2PROBE maps
     - pro: use the standard deprecation procedure for a smooth transition period
     - con: people that want the EC numbers right now still don't have them (they'll
            need to wait BioC 2.2)

  D. fix the ENZYME/ENZYME2PROBE maps and add 2 new maps (e.g. ECNAME/ECNAME2PROBE)
     for the mapping between probe ids and enzyme names
     - pro: consistency and completeness
     - con: the user code using the ENZYME/ENZYME2PROBE maps from ag and ath1121501
            will need to use the ECNAME/ECNAME2PROBE maps instead (but here the
            impact on the user is not as bad as with B since the data they
            have been using so far is still available but under different names)

  E. anything else?

Thanks for your feedback!

H.