[BioC] Two cDNA arrays whose samples are the same but probes are not

Wed Aug 20 03:53:10 CEST 2008

Dear all,

I am struggling with dealing with an unusual cDNA microarray data which were generated long time ago by a lab next door.

Their platform was 24K cDNA array. In order to enlarge genome coverage, they prepared two probe libraries and printed them onto two arrays. Let's call them array A and array B. Then they hybridized the two arrays with the same samples.

First I mapped their GenBank probe IDs into UniGene IDs. By doing so, it turned out that array series A has ~11,000 unique UniGene probe IDs, array series B has ~10,000 unique UniGene probe IDs, and their intersection is ~6,000. Thus, after normalizing array series A and B separately, I have the following data;

Array A series
ID  replicate1 replicate2 ... replicateN
A1   logRatioA1.1   logRatioA1.2   ... logRatioA1.N
A2   logRatioA2.1   logRatioA2.N   ... logRatioA2.N
....
A11,000  ....

Array B series
ID  replicate1 replicate2 ... replicateN
B1   logRatioB1.1   logRatioB1.2   ... logRatioB1.N
B2   logRatioB2.1   logRatioB2.2   ... logRatioB2.N
....
B10,000  ....

For probes that are present in only one of the two arrays, I think the analysis is simple. I can just do the statistical test for the two datasets separately and take those probes' results that are reported by one of the two datasets.

For probes that are present in both arrays, I am not sure how to proceed. From the two separate test results, one might report a probe significant whereas the other might not.

So I came up with this idea. First I can paste the two log ratio matrices together as follows;

ID  replicate1 replicate2 ... replicateN
A1   logRatioA1.1   logRatioA1.2   ... logRatioA1.N
A2   logRatioA2.1   logRatioA2.N   ... logRatioA2.N
....
A11,000  ....
B1   logRatioB1.1   logRatioB1.2   ... logRatioB1.N
B2   logRatioB2.1   logRatioB2.2   ... logRatioB2.N
....
B10,000  ....

Then, for an ID that occurs in both A and B, take the mean of two log ratio values. For example, if A1 and B1 correspond to the same ID, then its collapsed log ratio value will be (logRatioA1.1 + logRatioB1.1)/2 

The rationale for doing so is, since the two arrays were hybridized with same samples and since they were normalized, log ratio values between series A and B are comparable, meaning that log ratio values can be averaged between series A and B just like we can do so for duplicate probes within an array.

Is this approach valid enough? Or it is better to test the two matrices separately and report two test results side by side?

Thanks a lot,

Seungwoo

------------------------------------
Seungwoo Hwang, Ph.D.
Senior Research Scientist
Korean Bioinformation Center