[BioC] Normalization of bi-modal expression data

Fri Feb 6 21:21:00 CET 2009

Dear Christian

the common normalisation methods assume that the normalisation between 
arrays involve one common transformation (linear in the case of median 
scaling, affine linear in the case of vsn with default parameter 
'calib', local polynomial in the case of loess, non-parametric rank 
based in the case of quantile normalisation). I listed these in some 
order of flexibility.

However, if you have two distinct populations of intensities, I would 
recommend first finding out the origin for those - try to look for 
associations of these two  populations with all sorts of feature 
parameters (annotation, spatial position on the array) and make sure 
there is no show-stopper quality problem. Once you have that, the proper 
normalisation (perhaps stratified)  will follow from that.

Just applying one single overall transformation might be OK, but it 
could lead to distortions and inefficiency.

  Best wishes
	Wolfgang

Christian Brière wrote:
> Yes, I removed Agilent controls and I filtered the data using Agilent 
> flags (IsPosandSignificant and IsWellAboveBackground) before calculating 
> intensity distribution.
> The array was designed using Tobacco contigs defined from available 
> tobacco ESTs.
> My question is: whatever the origin of this bi-modal distribution, is it 
> a problem for normalization and what kind of normalization is the most 
> adequate ?
> 
> Christian
> 
> Sean Davis a Ã©crit :
>>
>> On Tue, Feb 3, 2009 at 8:03 AM, Naomi Altman <naomi at stat.psu.edu 
>> <mailto:naomi at stat.psu.edu>> wrote:
>>
>>     Did you remove the Agilent controls before looking at the
>>     intensity distribution?
>>
>>
>> And are these some odd array design like tiling arrays?
>>  
>>
>>
>>     At 06:01 AM 2/3/2009, Christian BriÃ¨re wrote:
>>
>>         Hi!
>>
>>         I am new in microarray analysis and in using Bioconductor. I
>>         need to
>>         analyse expression data from monocolor Agilent microarrays
>>         (105K).  To
>>         my surprise, for each array (controls as well as treated
>>         samples) the
>>         distribution of intensity data is bi-modal. Furthermore, it
>>         seems that
>>         more than 10% of the genes are differentially expressed
>>         between controls
>>         and treated samples. Therefore, I wonder what is the best
>>         method to use
>>         in such case for between arrays normalization. I was told that
>>         median or
>>         quantile normalization was not adequate. Should  Invariant Set
>>         or VSN
>>         normalization be better, and what are the packages to use for
>>         that ?
>>         Thanks for your help
>>
>>         --
>>
>>         Christian BriÃ¨re
>>         UMR CNRS-UPS 5546
>>         BP42617 Auzeville
>>         F-31326 Castanet-Tolosan (France)
>>         tel: +33(0)5 62 19 35 90
>>         Fax: +33(0)5 62 19 35 02
>>         E-mail: briere at scsv.ups-tlse.fr
>>         <mailto:briere at scsv.ups-tlse.fr>
>>         <mailto:briere at scsv.ups-tlse.fr <mailto:briere at scsv.ups-tlse.fr>>
>>
>>         http://www.scsv.ups-tlse.fr
>>         http://www.gdr2688.ups-tlse.fr
>>         <http://www.gdr2688.ups-tlse.fr/index.php>
>>         http://www.ifr40.cnrs.fr
>>
>>
>>