[BioC] Normalization of bi-modal expression data

Wolfgang Huber huber at ebi.ac.uk
Tue Feb 10 21:20:41 CET 2009


Dear Christian,

well, if they give approximately the same results, then it doesn't 
really matter, does it?

And if and where they disagree (i.e. if you do find substantially 
different sets of differentially expressed genes), look at the 
difference and decide whether it looks like extra sensitivity or an 
artifact to you. Sorry to be so vague, but if the answer were simpler, 
there would be no need for all these different normalisation methods.

  Thanks and best wishes
  Wolfgang


> Dear Wolfgang,
> 
> Thank you for your advices. I did'nt find any spatial heterogeneity on 
> the array which could explain a bi-modal distribution of intensity. One 
> possibility is that we have two populations of probes corresponding to 
> the two genomes of tobacco. But it is difficult to check.
> I tried different methods of global normalization, which seem to give 
> approximately the same results. So, I wonder what kind of criteria I 
> could use to select the "better" method ?
> 
> Christian
> 
> 
> Wolfgang Huber a écrit :
>> Dear Christian
>>
>> the common normalisation methods assume that the normalisation between 
>> arrays involve one common transformation (linear in the case of median 
>> scaling, affine linear in the case of vsn with default parameter 
>> 'calib', local polynomial in the case of loess, non-parametric rank 
>> based in the case of quantile normalisation). I listed these in some 
>> order of flexibility.
>>
>> However, if you have two distinct populations of intensities, I would 
>> recommend first finding out the origin for those - try to look for 
>> associations of these two  populations with all sorts of feature 
>> parameters (annotation, spatial position on the array) and make sure 
>> there is no show-stopper quality problem. Once you have that, the 
>> proper normalisation (perhaps stratified)  will follow from that.
>>
>> Just applying one single overall transformation might be OK, but it 
>> could lead to distortions and inefficiency.
>>
>>  Best wishes
>>     Wolfgang
>>
>>
>>
>>
>> Christian Brière wrote:
>>> Yes, I removed Agilent controls and I filtered the data using Agilent 
>>> flags (IsPosandSignificant and IsWellAboveBackground) before 
>>> calculating intensity distribution.
>>> The array was designed using Tobacco contigs defined from available 
>>> tobacco ESTs.
>>> My question is: whatever the origin of this bi-modal distribution, is 
>>> it a problem for normalization and what kind of normalization is the 
>>> most adequate ?
>>>
>>> Christian
>>>
>>> Sean Davis a écrit :
>>>>
>>>> On Tue, Feb 3, 2009 at 8:03 AM, Naomi Altman <naomi at stat.psu.edu 
>>>> <mailto:naomi at stat.psu.edu>> wrote:
>>>>
>>>>     Did you remove the Agilent controls before looking at the
>>>>     intensity distribution?
>>>>
>>>>
>>>> And are these some odd array design like tiling arrays?
>>>>  
>>>>
>>>>
>>>>     At 06:01 AM 2/3/2009, Christian Brière wrote:
>>>>
>>>>         Hi!
>>>>
>>>>         I am new in microarray analysis and in using Bioconductor. I
>>>>         need to
>>>>         analyse expression data from monocolor Agilent microarrays
>>>>         (105K).  To
>>>>         my surprise, for each array (controls as well as treated
>>>>         samples) the
>>>>         distribution of intensity data is bi-modal. Furthermore, it
>>>>         seems that
>>>>         more than 10% of the genes are differentially expressed
>>>>         between controls
>>>>         and treated samples. Therefore, I wonder what is the best
>>>>         method to use
>>>>         in such case for between arrays normalization. I was told that
>>>>         median or
>>>>         quantile normalization was not adequate. Should  Invariant Set
>>>>         or VSN
>>>>         normalization be better, and what are the packages to use for
>>>>         that ?
>>>>         Thanks for your help
>>>>
>>>>         --
>>>>
>>>>         Christian Brière
>>>>         UMR CNRS-UPS 5546
>>>>         BP42617 Auzeville
>>>>         F-31326 Castanet-Tolosan (France)
>>>>         tel: +33(0)5 62 19 35 90
>>>>         Fax: +33(0)5 62 19 35 02
>>>>         E-mail: briere at scsv.ups-tlse.fr
>>>>         <mailto:briere at scsv.ups-tlse.fr>
>>>>         <mailto:briere at scsv.ups-tlse.fr 
>>>> <mailto:briere at scsv.ups-tlse.fr>>
>>>>
>>>>         http://www.scsv.ups-tlse.fr
>>>>         http://www.gdr2688.ups-tlse.fr
>>>>         <http://www.gdr2688.ups-tlse.fr/index.php>
>>>>         http://www.ifr40.cnrs.fr
>>>>
>>>>
>>>>
>>
> 
> 
> -- 
> 
> Christian Brière
> UMR CNRS-UPS 5546
> BP42617 Auzeville
> F-31326 Castanet-Tolosan (France)
> tel: +33(0)5 62 19 35 90
> Fax: +33(0)5 62 19 35 02
> E-mail: briere at scsv.ups-tlse.fr <mailto:briere at scsv.ups-tlse.fr>
> 
> http://www.scsv.ups-tlse.fr
> http://www.gdr2688.ups-tlse.fr <http://www.gdr2688.ups-tlse.fr/index.php>
> http://www.ifr40.cnrs.fr
> 
>  
>



More information about the Bioconductor mailing list