[BioC] Group-wise gcrma normalization

Wed Feb 25 22:51:33 CET 2009

On Feb 25, 2009, at 14:57 , Jun Yin wrote:

> Hi, all,
>
> I have a problem with normalizing my Affymetrix microarray data. We  
> are using Affymetrix Zebrafish Genome Array. The experiment design  
> includes three treatments, namely A, C and G. We have three  
> biological replicates for each treatment, thus A1, A2, A3, C1, C2,  
> C3 and G1, G2, G3.
>
> A1, C1 and G1 were from the first batch (microarray experiment was  
> performed earlier). A2, A3, C2, C3 and G2, G3 were from the second  
> batch. We have very strong batch effect. If I use gcrma to normalize  
> the data, the only effect I can see is the batch effect, e.g. in the  
> hierarchical clustering, A1, C1 and G1 are clustered together. Then,  
> no matter what comparison I used, I cannot get any differentially  
> expressed genes from the data. It is obviously because the batch  
> effect (or background noise between batch) destroyed everything.
>
> By accident, I used gcrma to normalize the three replicates from  
> each treatment separately. Something dramatically happened, like this:
>
> $data<-ReadAffy()
> $data1.gcrma<-gcrma(data[,1:3])  #A samples
> $data2.gcrma<-gcrma(data[,4:6])  #C samples
> $data3.gcrma<-gcrma(data[,7:9])  #G samples
> $data.gcrma.exprs<-cbind(exprs(data1.gcrma,data2.gcrma,data3.gcrma))
>
> Then, all the batch effects were gone. The variance within each  
> group/treatment was dramatically reduced. But then, I realized that  
> gcrma/rma uses median polish to summarize the probe set value, which  
> iteratively substracting row median and column median. The probe set  
> signal is calculated by adding global median to column median, thus  
> highly depends on the original column median of the probe set. It  
> probably introduced artifact if I normalize different groups  
> separately.

I would doubt the results from a group wise normalization. You will  
artificially make the samples look more homogeneous within the group  
and therefore get more DE.

You could try to model the batch effect in limma, but it is not clear  
that you can get rid of it. And even if you model the batch effect, it  
is not clear that you will get much differential expression.

If you are capable, I would recommend redoing the experiment. That  
decision of course depends on how many resources you will need to  
spend on this.

Kasper

> The most interesting thing is that the genes we expected was in the  
> gene list generated by the group-wise gcrma normalization. So, I  
> just wonder if there is any reason that this group-wise gcrma is  
> acceptable. I am kinda desperate on deciding whether to use the data  
> or discard everything. Because the batch effect is so strong and  
> also because of the small sample size, no normalization works so far  
> (gcrma, rma, mas5, loess/quantile/contrasts/scale normalization).
> Thanks in advance.
>
>
>
> Jun Yin
> Ph.D. student in U.C.D.
> 2009-02-25
>
> Bioinformatics Laboratory
> Conway Institute
> University College Dublin
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor