[BioC] Plot signal distribution in miRNA arrays

Davis, Wade davisjwa at health.missouri.edu
Thu Aug 4 19:05:38 CEST 2011

Hi Andrea,
You are right about the robustness of the t-test. 

You had asked about the distribution across microRNAs on each array, but based on your questions, I think you should be asking about the distribution across arrays of each microRNA. There is a big difference in those distributions, as you can imagine. Plotting and examining each one of those microRNAs would be tedious. If your sample size is "large" (large is subjective, but I recall some paper with simulations showing good robustness with n>15 for unimodal data), then you can appeal to the central limit theorem that the distribution of the sample mean is approx normally distributed, regardless of parent distribution. If the sample size is small (n<5), making any reliable conclusion about the distribution based on KS test or plots is very unreliable in my opinion. A brief Google search turned up a paper (http://www.ukm.my/jsm/pdf_files/SM-PDF-40-6-2011/15%20NorAishah.pdf) which has a nice little simulation study on the power of normality tests for the 4 most common tests. At the smallest sample size they reported (n=39), the BEST power was 28%, where the alternative was from a chi-square w/3 df....


From: andrea.grilli at ior.it [andrea.grilli at ior.it]
Sent: Thursday, August 04, 2011 4:10 AM
To: Davis, Wade
Cc: bioconductor at r-project.org
Subject: RE: [BioC] Plot signal distribution in miRNA arrays

Hi Davis,
thank you for your exhaustive reply.
You are right, reason for this check is mainly statistic, because
after quantile normalization data was analyzed with t-test. I know
that this kind of test is "robust" to violation of normal
distribution, but I wanted to check my data (I know should have been
better doing these steps before the analysis...). Log2 transformation
improved symmetry of my data, but we are far from normal distribution.

I've a further question: do you think that using the mean of each
probe along all arrays could be a good resume on the general
distribution of the signals in my data? or this approach is altering
the results?

Thank you so much for your help and for your files, I'm pretty new to
Bioconductor and this is first time I try to perform this kind of

Citando "Davis, Wade" <davisjwa at health.missouri.edu>:

> Hi Andrea,
> As far as plotting, you would plot it as you would any other microarray data.
> Since you are interested in distribution across genes in a single
> array, then some subsetting of your expression matrix like
> hist(mydata[,1]) would give you a histogram of the first sample
> (assuming samples are stored column-wise as is typical). Or you
> could do plot(density()) instead of hist().
> But my primary reason for responding to your question is to ask why
> you would assume the distribution would or should be normal across
> mirna's in a GIVEN sample. It is hard for me to think why it should
> be (biologically), and there is no statistical requirement for it to
>  be. (Quantile normalization won't impose normality, but the log2
> transform will make the distribution more symmetric.)
> I have attached some plots from an Affy mirna (2.0) mouse experiment
>  that I happened to have open in R right now. The QC pdf contains
> plots before normalizing, and the other is after normalizing. (Note
> that it is really not necessary to plot each density after quantile
> normalization, they must look at the same by definition. I just did
> it since you asked about plotting each one separately). Notice how
> the distribution is long tailed.
> BTW, here is sample code for the density plot:
> par(mfrow=c(3,2),pty="s")
> for(i in 1:6){
> plot(density(exprs(mirna.norm_mouse)[,i]))
> }
> Wade
> -----Original Message-----
> From: andrea.grilli at ior.it [mailto:andrea.grilli at ior.it]
> Sent: Wednesday, August 03, 2011 4:53 AM
> To: bioconductor at r-project.org
> Subject: [BioC] Plot signal distribution in miRNA arrays
> Hi to all,
> I would like to plot signal distribution from miRNA array experiment,
> one only plot for all arrays. Any suggestion on how to perform it? Is
> there some easy way to check how this data fit a normal distribution
> (e.g. with K-S test)?
> Data comes from Agilent Human miRNA v.2, normalized with quantile
> method and log2 transformed.
> Thanks in advance,
> Andrea

Dr. Andrea Grilli
andrea.grilli at ior.it
phone 051/63.66.756

Laboratory of Experimental Oncology
Rizzoli Orthopaedic Institute
Codivilla Putti Research Center
via di Barbiano 1/10
40136 - Bologna - Italy

More information about the Bioconductor mailing list