[BioC] problems about cDNA vs genomic arrays normalization

Mon Nov 20 18:52:09 CET 2006

Thanks Jenny,

After reading your explanation, I still have 2 puzzles.
1. Before I also applied normalizeWithinArrays() method to this 
dataset.  Do you think it is correct or necessary in my case?

2. You said "For the statistical analysis, you use the R values 
directly."  But after normalizeBetweenArrays(), then a MAList was 
generated. It consisted of M, A value etc but not R value (red channel 
intensity). And then I fited my MAlist to the linear model by using:
    design<-modelMatrix(targets, ref="gDNA")
    fit<-lmFit(ma.paq,design)
I think all my following analysis are based on the M value. Finally, I 
used eBayes function to summary statistics in order to detect the most 
differently expressed genes.
    cont.matrix<-makeContrasts( WTvsMU=wt-mu,levels=design)
    fit2<-contrasts.fit(fit,cont.matrix)
    fit2<-eBayes(fit2)
So, I have no idea how to use R values directly. Was my codes wrong?
I was not quite sure about my code or method, because at the end I gave 
some uninterpretable results which did not meet the expectation of the 
biologists. That is why now I am recheck my code and methods.  Thank you 
again and also Wolfgang for your kindly help.

Kind regards,
Yanju

Jenny Drnevich wrote:

> Hi Yanju,
>
> I have just been working with a couple of data sets similar to yours 
> where a) one channel has the same reference and b) the assumptions of 
> few differences between sample and reference are not necessarily 
> upheld. In these cases I have been using the Rquantile or Gquantile 
> methods of normalizeBetweenArrays() in limma. These methods will do a 
> quantile normalization on the R or G channel indicated so they have 
> the "same empirical distribution across arrays, leaving the M-values 
> (log-ratios) unchanged." Say your reference is in the green channel - 
> doing a Gquantile normalization would force all the reference values 
> to have the same distribution, and then adjust the R channel values 
> accordingly. For the statistical analysis, you use the R values 
> directly because if you use the M values, it would be like you never 
> did the normalization. If the reference is not all in the same 
> channel, I manipulate the RGList so that they are all in the same 
> channel, but then I also include 'dye' as a batch effect in the model.
>
> HTH,
> Jenny
>
> At 10:32 AM 11/20/2006, yanju wrote:
>
>> Dear all,
>>
>> I have got a microarray dataset derived from common reference design.
>> The common reference is gemoic DNA.  In normal normalization, we assume
>> that  large fraction of genes is not differently expressed, then the
>> adjustment strategies are used to let the log-ratios have a median(mean)
>> of 0. But in my case, every spot would have the same observed signal in
>> the genomic channel while the signals in the cDNA channel vary greatly.
>> Therefore, the strategies that i just mentioned are not suitable. I was
>> wondering how to normalize this kinds of data? Is that any packages or
>> functions existed already? Expecting your reply.
>>
>> Regards,
>> Yanju
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> Jenny Drnevich, Ph.D.
>
> Functional Genomics Bioinformatics Specialist
> W.M. Keck Center for Comparative and Functional Genomics
> Roy J. Carver Biotechnology Center
> University of Illinois, Urbana-Champaign
>
> 330 ERML
> 1201 W. Gregory Dr.
> Urbana, IL 61801
> USA
>
> ph: 217-244-7355
> fax: 217-265-5066
> e-mail: drnevich at uiuc.edu