[BioC] How to identify corrupt Affy CEL file?

Henrik Bengtsson hb at stat.berkeley.edu
Fri Jun 1 10:56:45 CEST 2007


Hi,

your not the first one.  A few months ago I transfered a large data
set via an external HDD and like you it took a long time to notice the
fact that some CEL files were corrupt  - some how the CEL files were
still valid and read just file.  It was just some probe intensities
that had ridiculous large values.  I used MD5 on the files to identify
which files were corrupted.

As Seth suggested, the digest() function in the 'digest' package can
be used for this.

FYI: In August I will release aroma.affymetrix for analyzing small to
very large Affymetrix data sets etc etc.  Since I was bitten by the
above bug, I added methods for generating and validating sets of CEL
files via MD5.

Cheers

Henrik

On 5/31/07, Hooiveld, Guido <Guido.Hooiveld at wur.nl> wrote:
> Hi List,
>
> Does anyone know of a package/tool/script that allows checking the integrity of (Affymetrix CEL) files?? [e.g. using comparisons of MD5 checksums]?
>
> I am asking because when transferring a data set via FTP unexpectedly a CEL file became corrupt. Upon uploading the files are automatically analyzed in our pipeline. It took us quite some time to find out that the problem was caused by one faulty file out of 16 (and not something else).
>
>
> > data <- ReadAffy()
> Error in read.affybatch(filenames = l$filenames, phenoData = l$phenoData,  :
>         Is D:/Guido/A42_7_Int_ko_wy.CEL really a CEL file? tried reading as text, gzipped text and binary
> >
>
>
> This is the first time it happened to us, but now I realized that it would be very useful if after transferring the integrity of the CEL file could be checked, allowing the immediate identification of corrupt files.
>
> Thanks,
> Guido
>
> ------------------------------------------------
> Guido Hooiveld, PhD
> Nutrition, Metabolism & Genomics Group
> Division of Human Nutrition
> Wageningen University
> Biotechnion, Bomenweg 2
> NL-6703 HD Wageningen
> the Netherlands
>
> internet:   http://nutrigene.4t.com
> email:      guido.hooiveld at wur.nl
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list