[BioC] RMA vs gcRMA on 2 groups of samples

Robert Gentleman rgentlem at fhcrc.org
Fri Nov 2 20:53:33 CET 2007


Hi,
   If they were assayed at approximately the same time, using 
approximately the same protocols then yes, one normalization is likely 
to be better than two. I think that there may also be issues if the set 
of genes that are expressed is very different in the different tissue 
types (as them being the same is one of the basic assumptions in most 
normalization methods). But if very much is different, then it is better 
not to try and normalize, but rather to adjust after normalization.

best wishes
   Robert

James W. MacDonald wrote:
> Yes but if I am not mistaken, the OP had a situation in which the 
> samples were simply different cell or tissue types, rather than 
> different batches. I this case I would favor normalizing all together 
> rather than doing things in batches.
> 
> Best,
> 
> Jim
> 
> 
> Robert Gentleman wrote:
>>
>> Naomi Altman wrote:
>>> Dear Bogdan,
>>> Any normalization method that uses a set of arrays, reduces the 
>>> variability among those arrays.
>>>
>>> So, if you have 2 sets of arrays and normalize separately, you will 
>>> find that the within set variability is smaller than the between set 
>>> variability - i.e. you induce significant differential expression 
>>> simply by the normalization.  To avoid this effect, when you are 
>>> doing differential expression analysis (or sample clustering) you 
>>> must either use methods that normalize each array separately (MAS) or 
>>> normalize all together.
>>
>>   An alternative (and the one that I prefer) is to do separate 
>> normalizations, and to then use some sort of batch effect term in the 
>> model used to assess differentially expressed genes.
>>
>>   Normalization is intended to clean up the relatively minor issues 
>> that arise due to slightly different conditions etc. for arrays that 
>> are essentially the same.  As far as I can see it is not intended to 
>> adjust for batch effects, and in my experience generally does a bad 
>> job of that.  Just because you can normalize (or fit any statistical 
>> model) does not mean that you should.
>>
>>    best wishes
>>      Robert
>>
>>
>>> --Naomi
>>>
>>> At 12:01 PM 11/2/2007, Bogdan Tanasa wrote:
>>>> Greetings Naomi,
>>>>
>>>> thanks for reply. To generalize my question: when dealing with 2 
>>>> sets of
>>>> samples, let's say  X1, X2, ...., Xn  and  Y1, Y2, ..., Yn,
>>>> I could run the normalization in 2 ways: A. only X(1,n) and only 
>>>> Y(1,n), or
>>>> B. both X(1,n),Y(1,n). Are there any a priori statistical
>>>> criteria that favors a way or the other ? If I  would take into
>>>> consideration biological criteria (the things I am interested in), the
>>>> results
>>> >from A may sometimes look better than B', or vice versa. Thanks !
>>>> Bogdan
>>>>
>>>>
>>>>
>>>> On 11/2/07, Naomi Altman <naomi at stat.psu.edu> wrote:
>>>>> Dear Bogdan,
>>>>> I do not have an opinion on gcRMA versus RMA.  But if you are doing
>>>>> differential expression analysis comparing the cell samples with the
>>>>> organ samples, you need to normalize
>>>>> all the samples together.
>>>>>
>>>>> --Naomi
>>>>>
>>>>> At 11:31 AM 11/1/2007, Bogdan Tanasa wrote:
>>>>>> Hi folks,
>>>>>>
>>>>>> I would like to ask for your opinions on the following:
>>>>>>
>>>>>> I have 60 expression profiles of 60 samples (cells and organs in
>>>>>> resting conditions).
>>>>>> I normalized these arrays in many ways, including RMA.
>>>>>>
>>>>>> Considering the biological arguments (cells samples vs organs
>>>>>> samples), I am planning to do the normalization separately, on the
>>>>>> group of cell samples, and on the group of organ samples.
>>>>>>
>>>>>> My questions are:
>>>>>>
>>>>>> - after RMA normalization on separate groups of samples (cells vs
>>>>>> organs), the results are different, but are these better ? GO 
>>>>>> analysis
>>>>>> do not display major differences.
>>>>>>
>>>>>> - would gcRMA work better than RMA ? The majority of opinions in 
>>>>>> SoCal
>>>>>> are pro-RMA.
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> Bogdan
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>> Naomi S. Altman                                814-865-3791 (voice)
>>>>> Associate Professor
>>>>> Dept. of Statistics                              814-863-7114 (fax)
>>>>> Penn State University                         814-865-1348 
>>>>> (Statistics)
>>>>> University Park, PA 16802-2111
>>>>>
>>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: 
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> Naomi S. Altman                                814-865-3791 (voice)
>>> Associate Professor
>>> Dept. of Statistics                              814-863-7114 (fax)
>>> Penn State University                         814-865-1348 (Statistics)
>>> University Park, PA 16802-2111
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org



More information about the Bioconductor mailing list