[BioC] Normalization of bi-modal expression data
Wolfgang Huber
huber at ebi.ac.uk
Fri Feb 6 21:21:00 CET 2009
Dear Christian
the common normalisation methods assume that the normalisation between
arrays involve one common transformation (linear in the case of median
scaling, affine linear in the case of vsn with default parameter
'calib', local polynomial in the case of loess, non-parametric rank
based in the case of quantile normalisation). I listed these in some
order of flexibility.
However, if you have two distinct populations of intensities, I would
recommend first finding out the origin for those - try to look for
associations of these two populations with all sorts of feature
parameters (annotation, spatial position on the array) and make sure
there is no show-stopper quality problem. Once you have that, the proper
normalisation (perhaps stratified) will follow from that.
Just applying one single overall transformation might be OK, but it
could lead to distortions and inefficiency.
Best wishes
Wolfgang
Christian Brière wrote:
> Yes, I removed Agilent controls and I filtered the data using Agilent
> flags (IsPosandSignificant and IsWellAboveBackground) before calculating
> intensity distribution.
> The array was designed using Tobacco contigs defined from available
> tobacco ESTs.
> My question is: whatever the origin of this bi-modal distribution, is it
> a problem for normalization and what kind of normalization is the most
> adequate ?
>
> Christian
>
> Sean Davis a écrit :
>>
>> On Tue, Feb 3, 2009 at 8:03 AM, Naomi Altman <naomi at stat.psu.edu
>> <mailto:naomi at stat.psu.edu>> wrote:
>>
>> Did you remove the Agilent controls before looking at the
>> intensity distribution?
>>
>>
>> And are these some odd array design like tiling arrays?
>>
>>
>>
>> At 06:01 AM 2/3/2009, Christian Brière wrote:
>>
>> Hi!
>>
>> I am new in microarray analysis and in using Bioconductor. I
>> need to
>> analyse expression data from monocolor Agilent microarrays
>> (105K). To
>> my surprise, for each array (controls as well as treated
>> samples) the
>> distribution of intensity data is bi-modal. Furthermore, it
>> seems that
>> more than 10% of the genes are differentially expressed
>> between controls
>> and treated samples. Therefore, I wonder what is the best
>> method to use
>> in such case for between arrays normalization. I was told that
>> median or
>> quantile normalization was not adequate. Should Invariant Set
>> or VSN
>> normalization be better, and what are the packages to use for
>> that ?
>> Thanks for your help
>>
>> --
>>
>> Christian Brière
>> UMR CNRS-UPS 5546
>> BP42617 Auzeville
>> F-31326 Castanet-Tolosan (France)
>> tel: +33(0)5 62 19 35 90
>> Fax: +33(0)5 62 19 35 02
>> E-mail: briere at scsv.ups-tlse.fr
>> <mailto:briere at scsv.ups-tlse.fr>
>> <mailto:briere at scsv.ups-tlse.fr <mailto:briere at scsv.ups-tlse.fr>>
>>
>> http://www.scsv.ups-tlse.fr
>> http://www.gdr2688.ups-tlse.fr
>> <http://www.gdr2688.ups-tlse.fr/index.php>
>> http://www.ifr40.cnrs.fr
>>
>>
>>
More information about the Bioconductor
mailing list