[Bioc-devel] Dimension (rows & columns) in the writeCDF function seems to be swapped

Thu May 20 12:12:46 CEST 2010

Hello Karst,

You can download all required files via this link:
https://sendit.wur.nl/Download.aspx?id=9eed2afd-40a8-4770-b44b-5c0b6e55084c

Simply execute the commands in the Create_CDF.R script and everything you need will be created and installed. 

Note that also a fixed "PdInfo2Cdf.R" file is included. To create a CDF with the rows & columns NOT swapped (so that gives the error during RMA) please edit the script at the commented-out lines. Currently, the script creates a proper CDF that also does the RMA normalization properly.

Thank you for your help!

Regards,

Dr. Philip de Groot Ph.D.
Bioinformatics Researcher

Wageningen University / TIFN
Nutrigenomics Consortium
Nutrition, Metabolism & Genomics Group
Division of Human Nutrition
PO Box 8129, 6700 EV Wageningen
Visiting Address: Erfelijkheidsleer: De Valk, Building 304
Dreijenweg 2, 6703 HA  Wageningen
Room: 0052a
T: +31-317-485786
F: +31-317-483342
E-mail:   Philip.deGroot at wur.nl
Internet: http://www.nutrigenomicsconsortium.nl
             http://humannutrition.wur.nl/
             https://madmax.bioinformatics.nl/
________________________________________
From: kasperdanielhansen at gmail.com [kasperdanielhansen at gmail.com] On Behalf Of Kasper Daniel Hansen [khansen at stat.berkeley.edu]
Sent: 19 May 2010 18:34
To: Groot, Philip de
Cc: bioc-devel at stat.math.ethz.ch; Hooiveld, Guido
Subject: Re: [Bioc-devel] Dimension (rows & columns) in the writeCDF function   seems to be swapped

Hi Philip

I am trying to understand what you report here.

If I use the "wrong" cdf file and I read it using either readCdfHeader
in affxparser or read.cdffile.list (full output below) in affyio I get
  Cols = 990
  Rows = 1190
and I have looked at the binary file using hexdump and this is also
what is in the file according to Affymetrix file format
specifications.

This means that I can understand your message as either
1) the file pdmogene11stv1_wrong.cdf does not reflect the dimensions
in the PDinfo package.
2) Using this cdf file as input to makecdfenv yields wrong results.

Now, when I install the pd.mogene.1.1.st.v1 package I get
> pd.mogene.1.1.st.v1
Class........: AffyGenePDInfo
Manufacturer.: Affymetrix
Genome Build.: MM09
Chip Geometry: 1190 rows x  990 columns

which looks correct.  Not that the pdinfo2cdf script posted on the
aroma.affymetrix website has an error (the lines
  celHead <- readCelHeader(celfile)
  nrows <- celHead$rows
  ncols <- celHead$rows
), but you say in your email you have modifed the script (which never
made it through), and given that the CDF file corresponds to the
pdinfo object you have probably fixed this).

That leaves 2).  A casual inspection of the tarball
  mogene11stv1cdf_2.1.0_notSwitched.tar.gz
makes it look as if the dimensions here are correct as well.  However,
I suspect I am missing something.

In order to debug this further (and there could very well be an error
somewhere, where cols and rows are switched) we will need your scripts
for generating the normalized intensities and we probably also need
the cell files.  Please modify your scripts to use the exact same
filenames as the packages you have posted on sendit.

And not that you cannot attach anything (or at least, your attachments
does not make it through to me).

Kasper

> tmp = read.cdffile.list("pdmogene11stv1_wrong.cdf")
> tmp$Header
$Dimensions
  MagicNumber VersionNumber          Cols          Rows     n.QCunits
           67             1           990          1190             0
      n.units     LenRefSeq
        35556             0

$ReseqRefSeq
[1] ""

> readCdfHeader("pdmogene11stv1_wrong.cdf")
$ncols
[1] 990

$nrows
[1] 1190

$nunits
[1] 35556

$nqcunits
[1] 0

$refseq
[1] ""

$chiptype
[1] "pdmogene11stv1_wrong"

$filename
[1] "./pdmogene11stv1_wrong.cdf"

$rows
[1] 1190

$cols
[1] 990

$probesets
[1] 35556

$qcprobesets
[1] 0

$reference
[1] ""

>

On Wed, May 19, 2010 at 4:00 AM, Groot, Philip de <philip.degroot at wur.nl> wrote:
> Dear Kasper,
>
> I am contacting you because I am puzzled with an issue that is related to affxparser. I submitted this message to the bioc-devel list on april 29 (subject: "RE: [Bioc-devel] FW: problem in ReadAffy function (affy and affyio libraries)") but never received any reply. Interestingly, in the file writeCDF.R (affxparser source) the following comment is present:
>
> # 2006-09-11 /HB
> # o BUG FIX: nrows & ncols were swapped in the CDF header.
> The problem:
> I succesfully created a "mogene11stv1cdf" library that can be used with the "affy" library. Benefit of this is that e.g. the affyPLM library can be used for creating informative plots. I applied rma utilizing affy and oligo and found exactly the same normalized intensities (see attached png-image).
>
> So far so good, BUT... in order to get rma working for affy, I needed to switch the number or rows and columns when creating the CDF-file. And this puzzles me a lot! I do not understand where this originates from.
>
> Let me explain. I use the oligo library and the "pd.mogene.1.1.st.v1" annotation file to create the CDF-file. I (together with Guido Hooiveld) adapted the original "PdInfo2Cdf.R" script for this purpose. Information on the original "PdInfo2Cdf.R" script is here: http://www.aroma-project.org/node/40. The adapted file is also in the attachment.
>
> The CDF-file is created as follows:
> source("PdInfo2Cdf.R")
> PdInfo2Cdf("pd.mogene.1.1.st.v1", <An appropriate .CEL-file>);
>
> library(makecdfenv)
> make.cdf.package(file="pdmogene11stv1.cdf", packagename = "mogene11stv1cdf", author="Philip de Groot", maintainer="Philip de Groot <Philip.deGroot at wur.nl>", version="2.1.0", species="Mus_musculus")
>
> Both CDF-files are available in a single zip-file via sendit:
> https://sendit.wur.nl/Download.aspx?id=9299e472-1d5a-4f83-af75-c10aa1faf94c
>
> Does somebody has an explanation for this? Is it a bug in writeCDF?
>
> Regards,
>
> Dr. Philip de Groot Ph.D.
> Bioinformatics Researcher
>
> Wageningen University / TIFN
> Nutrigenomics Consortium
> Nutrition, Metabolism & Genomics Group
> Division of Human Nutrition
> PO Box 8129, 6700 EV Wageningen
> Visiting Address: Erfelijkheidsleer: De Valk, Building 304
> Dreijenweg 2, 6703 HA  Wageningen
> Room: 0052a
> T: +31-317-485786
> F: +31-317-483342
> E-mail:   Philip.deGroot at wur.nl<mailto:Philip.deGroot at wur.nl>
> Internet: http://www.nutrigenomicsconsortium.nl<http://www.nutrigenomicsconsortium.nl/>
>             http://humannutrition.wur.nl/
>             https://madmax.bioinformatics.nl/
>
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>