[BioC] Data sets conducted in different labs

Tue Oct 19 22:55:46 CEST 2004

Hi there;
I am sorry if my question doesn't qualify for BioC mail list.

Have you met the situation that two labs carried out the same/similar
experiment, but came out with quite different results in term of
differentially expressed genes identified.  Have anyone  had done the
studies on this problem, any reference/observations?

The usual way is to identify genes based on two lab's data, respectively, 
then compare the results. What about make one model for the combined data
from two labs which takes lab as one potential factor. In this case, how
to do the pre-processing part, normalize all data together or two lab's
data separately? Any recommendations?

What I observed is: I observed clearly systematic difference in the data
from two lab. But after I normalize all data ( I used rma )together, you
still can tell the different origin of the data after normalization, and
the model test (limma) that the lab factor is significant for about 50%
genes. My question is: in this case (normalize all data together), should
I include the lab as one factor? It seems normalizing procedure can't
cancel lab effects.

But if I normalize two lab's data separately, they will have different
variation. Even with a lab factor, I can't use model two lab's data into
one model.

Any comments/suggestions will be appreciated.

Bests;
Fangxin