[BioC] How to use DESeq to normalize and estimate variance in a RNAseq timecourse analysis
marie.semon at ens-lyon.fr
Fri May 11 14:52:50 CEST 2012
We wanted first to determine a set of genes for which expression level
differs statistically between at least one time point and the controls,
because we need to estimate the whole set of genes regulated at some
point or the other by the treatment. This is why we compared
sequentially Ctr/T1 , Ctr/T2, Ctr/T3 etc... and then took the union of
these five lists. We performed this kind of analysis because we thought
that in DESeq it is not possible to test wether a gene is deregulated
over the whole time series experiment. But perhaps are we wrong here?
>While each time point does not have a replicate, if the biological
signal that you are interested in appears and disappears at rates lower
than the sampling time >interval, you can still get an idea about some
of the variability in the data, e.g. by fitting a trend and looking at
I'm sorry but I have not understood your suggestion here...
However, we performed the clustering you suggested (as described in
DESeq vignette), and we reassuringly recovered the grouping of the
samples according to our time points (controls grouped together, then
point 1, point 2, point 3 etc). We also obtained clusters of genes
corresponding to coexpressed genes that separate, somewhat reassuringly,
genes known to be regulated early or later after treatment. I guess
that p-values could be obtained from this clustering to assess
statistically these clusters of genes with similar expression profiles
(maybe via a boostrap analysis?). Is that what you meant by "getting
p-values from that"?
Thanks a lot again for your suggestions,
Le 10/05/12 21:04, Wolfgang Huber a écrit :
> Hi Marie
> Simon and you raised the point that comparing each of the five time
> points (unreplicated) against control, and then presumably comparing
> these lists (for what? overlap?) is likely suboptimal.
> While each time point does not have a replicate, if the biological
> signal that you are interested in appears and disappears at rates
> lower than the sampling time interval, you can still get an idea about
> some of the variability in the data, e.g. by fitting a trend and
> looking at the residuals. The first thing I would do here, in fact, is
> to transform the data on a variance stabilised scale (with DESeq, as
> described in the vignette), filter out all genes that show too small
> variability overall, and then cluster the patterns. You don't directly
> get p-values from that (though with some imagination that can be
> done), but it might be a lot more informative than 5 lists.
> In any case, having a replicate of the time course seems essential for
> reliable inference.
> Best wishes
More information about the Bioconductor