[BioC] [limma] [Rfit] [samr] Gene expression distribution using lmFit and eBayes

Gordon K Smyth smyth at wehi.EDU.AU
Sat Nov 23 02:55:00 CET 2013

Dear Jerome,

The Shapiro test is only applicable to iid samples, so it is difficult to 
see how it could be used to test normality of expression values in a 
linear modelling context.  If you have applied the test to the normalized 
expression values for each gene, then I suspect that the test is actually 
picking up differential expression rather than non-normality.

The limma code is very robust against non-normality.  All the usual 
microarray platforms and standard preprocessing procedures produce data 
that is normally distributed to a good enough approximation.  Much effort 
has been devoted to developing good preprocessing and normalization 

The concept of "robustness" in statistical analysis goes back a 1953 paper 
by George Box in Biometrika.  In that paper, Box wrote of the "remarkable 
property of robustness to non-normality which [tests for comparing means] 
possess".  The tests done by limma inherit the robustness property that 
Box was referring to.  Box made the point that the robustness of the two 
sample t-test was not improved by checking first for equal variances.  He 

"To make the preliminary test on variances is rather like putting to sea 
in a rowing boat to find out whether conditions are sufficiently calm for 
an ocean liner to leave port!"

I rather think that, if Box was still alive today, he might say something 
similar about a preliminary Shapiro test!

Best wishes

> Date: Thu, 21 Nov 2013 17:42:21 -0500
> From: Jerome Lane <jerome.lane at criucpq.ulaval.ca>
> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] [limma] [Rfit] [samr] Gene expression distribution
> 	using lmFit and eBayes
>   Hi,
>   The 3/4 of my microarray gene expressions have non normal distribution with
>   most of p-values after Shapiro test under 10x-5.
>   I tried linear ranked regression from rfit (no normality assumption for
>   residues)  from Rfit package for adjustment of covariables +  SAM (non
>   parametric) from samr package but results where not as biologically relevant
>   as lmFit + eBayes could provide.
>   I know that lmFit function can analyses gene expression not strictly normal,
>   but what is the limit ?
>   Is it statistically relevant to use lmFit + eBayes according to my data ?
>   Best regards,
>   Jerome Lane

The information in this email is confidential and intend...{{dropped:4}}

More information about the Bioconductor mailing list