[Bioc-devel] File parsers/readers for Illumina's Product Files (*.bpm, *.egt, *.xml)?

Henrik Bengtsson hb at biostat.ucsf.edu
Mon Mar 28 02:02:09 CEST 2011


Illumina provides a set of so called Product Files for each chip type.
 For SNP arrays, they provide "Manifest (.bpm), Cluster File (.egt),
and Product Descriptor File (.xml)", where the first two are clearly
in a binary file format and the latter in an XML (text) file format.
For instance, the file HumanOmniExpress-12Multi_ProductFiles.zip from
http://icom.illumina.com/ (req's login) contains:

HumanOmniExp-12v1MultiUse_15014143_A.xml [3,654 bytes]
HumanOmniExpress-12v1-Multi_C.bpm [121,566,176 bytes]
HumanOmniExpress-12v1-Multi_C.egt [122,712,751 bytes]

Right now, I'm mainly interested in being able to access the cluster
information data in the *.egt files.  Does anyone know of file parsers
for this (preferably in/via R), or a description of these file
formats?  The hope is to be able to read this data without having
access to Illumina's software, cf. what the affxparser package
provides for Affymetrix files.



