[BioC] How to use DESeq to normalize and estimate variance in a RNAseq timecourse analysis

Wolfgang Huber whuber at embl.de
Thu May 10 21:04:57 CEST 2012


Hi Marie

Simon and you raised the point that comparing each of the five time 
points (unreplicated) against control, and then presumably comparing 
these lists (for what? overlap?) is likely suboptimal.

While each time point does not have a replicate, if the biological 
signal that you are interested in appears and disappears at rates lower 
than the sampling time interval, you can still get an idea about some of 
the variability in the data, e.g. by fitting a trend and looking at the 
residuals. The first thing I would do here, in fact, is to transform the 
data on a variance stabilised scale (with DESeq, as described in the 
vignette), filter out all genes that show too small variability overall, 
and then cluster the patterns. You don't directly get p-values from that 
(though with some imagination that can be done), but it might be a lot 
more informative than 5 lists.

In any case, having a replicate of the time course seems essential for 
reliable inference.

	Best wishes
	Wolfgang






May/9/12 10:03 PM, Marie Sémon scripsit::
> Dear all,
>
> We are using DESeq to analyse differential expression in a RNAseq
> timecourse analysis (5 time points after treatment + control).
> The dataset contains 3 replicates for the control, and single measures
> for each time point. For each timepoint, we aim to extract differentially
> expressed genes relative to control.
>
> We are wondering what is the best procedure to prepare this dataset for
> this analysis (steps of normalization + variance estimation):
> 1) is it better to start with normalizing + estimating dispersion on the
> whole dataset (5 points + 3 controls), and then to test for differential
> expression in
> the two by two comparisons just mentionned
> 2) or is it better to normalize + estimate dispersion on restricted
> datasets composed of 1 time-point + 3 controls, and then test for
> differential expression between this time point and the controls.
>
> It seems to us that the first procedure is better, because it may be
> less sensitive to outliers. But we would be grateful to have your
> enlightened input.
>
> Thank you very much in advance,
>
> Cheers,
>
> Marie


-- 
Best wishes
	Wolfgang

Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioconductor mailing list