[R] WGCNA on heterogeneous RNA-seq

Thu May 15 08:33:39 CEST 2014

Hi Pan,

On Wed, May 14, 2014 at 9:14 PM, Panos Bolan <panbolan at hotmail.com> wrote:
> Dear list,
>
> Apologies for posting this to both Bioconductor and here. I recently read a
> Bioconductor post where the developer of the WGCNA suggested the use of the
> package for RNA-seq data analysis after implementing a variance
> stabilization normalization to the raw counts. I have read the tutorials and
> run the example dataset at
> http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/index.html.
> I would like to apply WGCNA to my RNA-seq data consisting of 1000
> transcripts whose expression is measured for 50 triplicated cell types
> (approximately 150 samples) and derive networks.
>
> I would like to ask if WGCNA can be used successfully in this kind of
> heterogeneous dataset

My standard answer to this question is that the success of WGCNA (or
any other analysis, for that matter) depends on the design of your
experiment and what the question is you want to answer. Are the
inter-line differences going to help you answer the question, or will
they confound it?

> where for most of the transcripts, the various cell
> types expression patterns might differ substantially (so that a variance
> stabilizing transformation will not give me approximately normal
> distribution for each transcript; it would rather be a mixture of normal
> distributions).

Normal distribution is not a pre-requisite of WGCNA, or, indeed, any
other linear model-based analysis (correlation can be thought of as
one of the statistics arising from a linear model). Linear models do
not assume that variables are distributed normally, only that their
residuals are distributed normally (and have the same variance - this
is where VST comes in).

HTH,

Peter