[BioC] effect of normalization on analysis of differential knockdown

Wolfgang Huber whuber at embl.de
Mon Jul 20 12:12:36 CEST 2009


Hi Naomi,

of course normalisation is useful. I want to point out the importance of 
complementing it by quality assessment & control.

Just comparing different normalisation 'black boxes' on the basis of 
resulting hit lists (of which there seemed a hint in the original post, 
and which has all too often been done with microarray data in this 
community) is less advisable.
	
Best wishes
      Wolfgang

-------------------------------------------------------
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber
-------------------------------------------------------




Naomi Altman ha scritto:
> Why would you bother to normalize if it did not affect the results of 
> the analysis?  The purpose of normalization is to dampen some of the 
> noise so that the
> signal (i.e. differential expression) is clearer.  The normalization 
> method can have a huge effect, depending on how much noise there was in 
> the experiment, and
> whether the assumptions underlying the normalization are met.
> 
> I am not familiar with B-score normalization.  Normalization to the 
> median of a particular treatment or control makes sense if you expect 
> the median of all the samples to be the same except for noise.  If not, 
> e.g. if there is down-regulation but no up-regulation, then you are 
> inducing signal by normalizing.
> 
> --Naomi
> 
> At 05:14 PM 7/18/2009, Wolfgang Huber wrote:
> 
>> Hi Rajarshi
>>
>> your t, p, q value computation seems reasonable to me. You may want to 
>> choose a regularised version of the t-test (like in limma's eBayes) 
>> since with only 4 samples, you may otherwise get an unnecessarily 
>> large fraction of false discoveries due to the sample variance being 
>> small (and t large) by chance.
>>
>> As for your question about the choice of normalisation method one 
>> (perhaps not too constructive, but not ignorable) possible answer is 
>> that the technical or biological variability ("noise") in your data is 
>> stronger than the biological signal.
>>
>>         Best wishes
>>         Wolfgang
>>
>>
>> Rajarshi Guha wrote:
>>> Hi, I am analysing the results from a drug sensitization siRNA screen 
>>> and am trying to determine which genes are being differentially 
>>> knocked down (between a vehicle only run and a dosed run).
>>> Each gene is targeted by 4 siRNA's and my initial strategy has been 
>>> to consider the signals from the 4 siRNA's to be individual samples 
>>> for that gene. Then I perform a paired t-test on the 4 signals for a 
>>> given gene across the two conditions. I then calculate Storey's 
>>> q-values based on the resultant p-values.
>>> The question: does/should the normalization of the plates have an 
>>> effect on the results of the above analysis? For example, I 
>>> considered two normalization schemes - 1) normalizing each plate to 
>>> the median of a separate negative control plate and 2) B-score 
>>> normalization.
>>> If I rank the genes based on their q-values I get 2 very different 
>>> rankings for the two normalization schemes. Furthermore, the q- & 
>>> p-values differ greatly. In the case of median normalization I get a 
>>> number of q-values < 0.05 but when using B-score I get a single gene 
>>> with a q-value < 0.05 (and the next closest value is 0.58).
>>> Thinking that this study is analogous to differential expression 
>>> studies in microarrays, I tried running my dataset through the SAM 
>>> method (via siggenes). Using this method, the B-score normalized data 
>>> leads to no hits (and a pi0 = 1) whereas the median normalization 
>>> method leads to lots of hits.
>>> I can see that B-score normalized data would differ in character from 
>>> median normalized data (seeing that the actual signals are replaced 
>>> with scaled residuals) - but is it to be expected that normalization 
>>> schemes would lead to such different results in this type of analysis?
>>> Any pointers would be appreciated.
>>> Thanks,
>>> -------------------------------------------------------------------
>>> Rajarshi Guha  <rajarshi.guha at gmail.com>
>>> GPG Fingerprint: D070 5427 CC5B 7938 929C  DD13 66A1 922C 51E7 9E84
>>> -------------------------------------------------------------------
>>> Q:  What's polite and works for the phone company?
>>> A:  A deferential operator.
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> -- 
>>
>> Best wishes
>>      Wolfgang
>>
>> -------------------------------------------------------
>> Wolfgang Huber
>> EMBL
>> http://www.embl.de/research/units/genome_biology/huber
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> Naomi S. Altman                                814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics                              814-863-7114 (fax)
> Penn State University                         814-865-1348 (Statistics)
> University Park, PA 16802-2111
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor


--



More information about the Bioconductor mailing list