[BioC] Biocore response to Affymetrix data format changes
Isaac Neuhaus
isaac.neuhaus at bms.com
Mon Jun 30 15:48:15 MEST 2003
Many of us work in the pharmaceutical industry and have been taking advantage of your excellent tools.
We are also 'monetarily speaking' important Affymetrix customers. I would like to know how WE, in the
pharmaceutical industry could help and facilitate your continuing effort in developing these useful
tools.
Isaac
Vincent Carey 525-2265 wrote:
> D. Kulp of Affymetrix commented on the upcoming proprietary GeneChip
> data formats in a Bioconductor mailing list post of 25 June 2003.
> He notes that Windows/Java linkable libraries will be provided
> for reading the binary GeneChip format, and that MAGE/ML
> exports will be available. He proposes
> 1) Bioconductor can provide free compiled libraries using
> the API and the affymetrix linkable libraries
> 2) Bioconductor applications use MAGE/ML, as data bloat is
> not noteworthy and the export contains 'all the CEL data you
> expect'.
>
> Kulp comments that these observations show that the details
> of the change are "fairly simple". In fact, the change has
> far-reaching implications for those who work with Bioconductor
> software and affymetrix data.
>
> The Bioconductor project has adopted a policy of programming
> only to public and open APIs. Primary reasons:
> a) R is free software under the GPL. Although we have made
> an effort to release the main Bioc components under LGPL, as
> a collaborative gesture towards commercial entities who wish
> to use our tools, R itself is GPL. It is not possible to
> legally distribute tools that combine compilations of
> non-free software with GPL software.
> b) Beyond the restrictions of the GPL in relation to R,
> the Digital Millenium Copyright Act (DMCA) creates legal
> complications for those who create compilations of mixed
> free and proprietary software. We have no resources to spend
> on legal advice or on adapting our research to a complex
> legal landscape. Commitment to public and open APIs allows
> us to carry on research in a natural and efficient way largely
> independently of DMCA restriction and interpretation in
> the complex area of reverse engineering.
> c) Commitment to public and open APIs leverages the user
> community's capabilities to discover problems and to
> fix them. While distribution of compiled libraries with
> open components as interfaces to proprietary formats may
> SEEM consistent with open source software methodology,
> this is an illusion. We have benefited from user-contributed
> bug fixes and would cease to do so under the regimen proposed
> by Kulp, because users would lack access to key elements of
> the interface.
> d) Commitment to public and open APIs sharply reduces
> effort required to support multiple platforms. When compiled
> libraries are distributed one frequently encounters conflicts
> with resident versions of supporting libraries and one
> needs to introduce substantial technology for bridging
> distributed objects to platforms whose resources may be
> out of date or noncompliant with basic standards. Time spent
> on nonstandard portability methodology is time subtracted
> from research on computational biology. As researchers
> we cannot accept this additional cost.
> e) Commitment to public and open APIs is the only approach
> compatible with the recognition that microarray analysis
> technology is immature and must be fully open to scrutiny
> if science is to advance in an efficient way. Comparisons
> of MAS4, MAS5, Li and Wong's MBEI and RMA probe-level
> analyses indicate that the procedures yield different results.
> Users have a right to expect that results from different
> methodologies can be fully rationalized, and this can only
> occur with open implementations.
>
> These five points respond to Kulp's suggestion that we
> provide free binaries to the user community. The suggestion
> seems simple and positive but it is not feasible at all.
>
> Kulp's second suggestion is to employ the MAGE-ML format.
> It does appear that this constitutes a public and open API
> and one that we could program to. However it does appear
> that there will be significant information restrictions and
> performance costs if we are forced to go in this direction.
> We have one report of significant data bloat with the
> current embodiments of this technology. A 7 megabyte
> cell file had a 30 MB XML representation, and a 21 MB
> CDF file had a 400 MB XML representation. Kulp suggests
> that XML bloat does not occur, and that may be due to
> his access to newer forms of the transformation. We
> believe that compliant MAGE-ML representations will be
> massive. Requiring Bioconductor to work from MAGE-ML
> will lead to additional burdens on users that will
> impede research progress.
>
> In summary, Bioconductor's commitment to open and public
> APIs is dictated by legal and scientific considerations.
> Affymetrix' transition to closed file formats is difficult
> to understand. No one questions the technical utility of
> a change to a binary format. Making it secret has no
> utility that we can discern. Bioconductor and its users
> have provided R&D to affymetrix essentially free of charge.
> The upcoming Affymetrix GeneChip Microarray Low-Level Workshop
> ( http://eci-events.com/AffyGeneChip/ ) is proof that Affymetrix
> appreciates and is open to these contributions.
> Accommodating a non-public, non-open API for Affymetrix data
> would constitute a precedent that might impact methods
> adopted by other companies in this field. We respectfully
> ask that Affymetrix make a rather different precedent:
> open the new file format to support and encourage research
> and development in the microarray analysis domain.
> An open format will clearly benefit both Affymetrix and
> the scientific community.
>
> Sincerely,
> The Bioconductor Core Team
>
> * Douglas Bates, University of Wisconsin, USA.
> * Vince Carey, Harvard Medical School, USA.
> * Marcel Dettling, Federal Inst. Technology, Switzerland.
> * Sandrine Dudoit, Division of Biostatistics, UC Berkeley, USA.
> * Byron Ellis, Harvard Department of Statistics, USA.
> * Laurent Gautier, Technial University of Denmark, Denmark.
> * Robert Gentleman, Harvard Medical School, USA.
> * Jeff Gentry, Dana-Farber Cancer Institute, USA.
> * Kurt Hornik, Technische Universitat Wien, Austria.
> * Torsten Hothorn, Institut fuer Medizininformatik, Biometrie und Epidemiologie, Germany.
> * Wolfgang Huber, DKFZ Heidelberg, Molecular Genome Analysis, Germany.
> * Stefano Iacus, University of Milan, Italy
> * Rafael Irizarry, Department of Biostatistics (JHU), USA.
> * Friedrich Leisch, Technische Universitat Wien, Austria.
> * Martin Maechler, Federal Inst. Technology, Switzerland.
> * Gordon Smyth, Walter and Eliza Hall Institute, Australia.
> * Anthony Rossini, University of Washington and the Fred Hutchinson Cancer Research Center, USA.
> * Gunther Sawitzki, Institute fur Angewandte Mathematik, Germany.
> * Luke Tierney, University of Iowa, USA.
> * Jean Yee Hwa Yang, University of California, San Francisco, USA.
> * Jianhua (John) Zhang, Dana-Farber Cancer Institute, USA.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
More information about the Bioconductor
mailing list