[BioC] Biocore response to Affymetrix data format changes

Isaac Neuhaus isaac.neuhaus at bms.com
Mon Jun 30 15:48:15 MEST 2003


Many of us work in the pharmaceutical industry and have been taking advantage of your excellent tools.
We are also 'monetarily speaking' important Affymetrix customers. I would like to know how WE, in the
pharmaceutical industry could help and facilitate your continuing effort in developing these useful
tools.

Isaac

Vincent Carey 525-2265 wrote:

> D. Kulp of Affymetrix commented on the upcoming proprietary GeneChip
> data formats in a Bioconductor mailing list post of 25 June 2003.
> He notes that Windows/Java linkable libraries will be provided
> for reading the binary GeneChip format, and that MAGE/ML
> exports will be available.  He proposes
>  1) Bioconductor can provide free compiled libraries using
> the API and the affymetrix linkable libraries
>  2) Bioconductor applications use MAGE/ML, as data bloat is
> not noteworthy and the export contains 'all the CEL data you
> expect'.
>
> Kulp comments that these observations show that the details
> of the change are "fairly simple".  In fact, the change has
> far-reaching implications for those who work with Bioconductor
> software and affymetrix data.
>
> The Bioconductor project has adopted a policy of programming
> only to public and open APIs.  Primary reasons:
>  a) R is free software under the GPL.  Although we have made
> an effort to release the main Bioc components under LGPL, as
> a collaborative gesture towards commercial entities who wish
> to use our tools, R itself is GPL.  It is not possible to
> legally distribute tools that combine compilations of
> non-free software with GPL software.
>  b) Beyond the restrictions of the GPL in relation to R,
> the Digital Millenium Copyright Act (DMCA) creates legal
> complications for those who create compilations of mixed
> free and proprietary software.  We have no resources to spend
> on legal advice or on adapting our research to a complex
> legal landscape.  Commitment to public and open APIs allows
> us to carry on research in a natural and efficient way largely
> independently of DMCA restriction and interpretation in
> the complex area of reverse engineering.
>  c) Commitment to public and open APIs leverages the user
> community's capabilities to discover problems and to
> fix them.  While distribution of compiled libraries with
> open components as interfaces to proprietary formats may
> SEEM consistent with open source software methodology,
> this is an illusion.  We have benefited from user-contributed
> bug fixes and would cease to do so under the regimen proposed
> by Kulp, because users would lack access to key elements of
> the interface.
>  d) Commitment to public and open APIs sharply reduces
> effort required to support multiple platforms.  When compiled
> libraries are distributed one frequently encounters conflicts
> with resident versions of supporting libraries and one
> needs to introduce substantial technology for bridging
> distributed objects to platforms whose resources may be
> out of date or noncompliant with basic standards.  Time spent
> on nonstandard portability methodology is time subtracted
> from research on computational biology.  As researchers
> we cannot accept this additional cost.
>  e) Commitment to public and open APIs is the only approach
> compatible with the recognition that microarray analysis
> technology is immature and must be fully open to scrutiny
> if science is to advance in an efficient way.  Comparisons
> of MAS4, MAS5, Li and Wong's MBEI and RMA probe-level
> analyses indicate that the procedures yield different results.
> Users have a right to expect that results from different
> methodologies can be fully rationalized, and this can only
> occur with open implementations.
>
> These five points respond to Kulp's suggestion that we
> provide free binaries to the user community.  The suggestion
> seems simple and positive but it is not feasible at all.
>
> Kulp's second suggestion is to employ the MAGE-ML format.
> It does appear that this constitutes a public and open API
> and one that we could program to.  However it does appear
> that there will be significant information restrictions and
> performance costs if we are forced to go in this direction.
> We have one report of significant data bloat with the
> current embodiments of this technology.  A 7 megabyte
> cell file had a 30 MB XML representation, and a 21 MB
> CDF file had a 400 MB XML representation.  Kulp suggests
> that XML bloat does not occur, and that may be due to
> his access to newer forms of the transformation.  We
> believe that compliant MAGE-ML representations will be
> massive.  Requiring Bioconductor to work from MAGE-ML
> will lead to additional burdens on users that will
> impede research progress.
>
> In summary, Bioconductor's commitment to open and public
> APIs is dictated by legal and scientific considerations.
> Affymetrix' transition to closed file formats is difficult
> to understand.  No one questions the technical utility of
> a change to a binary format.  Making it secret has no
> utility that we can discern.  Bioconductor and its users
> have provided R&D to affymetrix essentially free of charge.
> The upcoming Affymetrix GeneChip Microarray Low-Level Workshop
> ( http://eci-events.com/AffyGeneChip/ ) is proof that Affymetrix
> appreciates and is open to these contributions.
> Accommodating a non-public, non-open API for Affymetrix data
> would constitute a precedent that might impact methods
> adopted by other companies in this field.  We respectfully
> ask that Affymetrix make a rather different precedent:
> open the new file format to support and encourage research
> and development in the microarray analysis domain.
> An open format will clearly benefit both Affymetrix and
> the scientific community.
>
> Sincerely,
> The Bioconductor Core Team
>
>     * Douglas Bates, University of Wisconsin, USA.
>     * Vince Carey, Harvard Medical School, USA.
>     * Marcel Dettling, Federal Inst. Technology, Switzerland.
>     * Sandrine Dudoit, Division of Biostatistics, UC Berkeley, USA.
>     * Byron Ellis, Harvard Department of Statistics, USA.
>     * Laurent Gautier, Technial University of Denmark, Denmark.
>     * Robert Gentleman, Harvard Medical School, USA.
>     * Jeff Gentry, Dana-Farber Cancer Institute, USA.
>     * Kurt Hornik, Technische Universitat Wien, Austria.
>     * Torsten Hothorn, Institut fuer Medizininformatik, Biometrie und Epidemiologie, Germany.
>     * Wolfgang Huber, DKFZ Heidelberg, Molecular Genome Analysis, Germany.
>     * Stefano Iacus, University of Milan, Italy
>     * Rafael Irizarry, Department of Biostatistics (JHU), USA.
>     * Friedrich Leisch, Technische Universitat Wien, Austria.
>     * Martin Maechler, Federal Inst. Technology, Switzerland.
>     * Gordon Smyth, Walter and Eliza Hall Institute, Australia.
>     * Anthony Rossini, University of Washington and the Fred Hutchinson Cancer Research Center, USA.
>     * Gunther Sawitzki, Institute fur Angewandte Mathematik, Germany.
>     * Luke Tierney, University of Iowa, USA.
>     * Jean Yee Hwa Yang, University of California, San Francisco, USA.
>     * Jianhua (John) Zhang, Dana-Farber Cancer Institute, USA.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor



More information about the Bioconductor mailing list