[BioC] How to use DESeq to normalize and estimate variance in a RNAseq timecourse analysis

Wolfgang Huber whuber at embl.de
Sat May 12 12:27:11 CEST 2012


Dear Marie

On 5/11/12 2:52 PM, Marie Sémon wrote:
> Hi Wolfgang
>
> We wanted first to determine a set of genes for which expression level
> differs statistically between at least one time point and the controls,
> because we need to estimate the whole set of genes regulated at some
> point or the other by the treatment. This is why we compared
> sequentially Ctr/T1 , Ctr/T2, Ctr/T3 etc... and then took the union of
> these five lists. We performed this kind of analysis because we thought
> that in DESeq it is not possible to test wether a gene is deregulated
> over the whole time series experiment. But perhaps are we wrong here?
>
>  >While each time point does not have a replicate, if the biological
> signal that you are interested in appears and disappears at rates lower
> than the sampling time >interval, you can still get an idea about some
> of the variability in the data, e.g. by fitting a trend and looking at
> the residuals.
> I'm sorry but I have not understood your suggestion here...

In an ANOVA-like setting, you estimate the variance of your data by 
looking at how the replicates for one condition vary around the mean. If 
you can fit a smooth trend line, you estimate variance by looking at how 
much the data vary around the line. The text by Catherine Loader, Local 
Regression and Likelihood, is a good place to check if you want to 
pursue this. (I am not aware of a canned solution for RNA-Seq here, this 
is perhaps a little research project.)

> However, we performed the clustering you suggested (as described in
> DESeq vignette), and we reassuringly recovered the grouping of the
> samples according to our time points (controls grouped together, then
> point 1, point 2, point 3 etc). We also obtained clusters of genes
> corresponding to coexpressed genes that separate, somewhat reassuringly,
> genes known to be regulated early or later after treatment. I guess that
> p-values could be obtained from this clustering to assess statistically
> these clusters of genes with similar expression profiles (maybe via a
> boostrap analysis?). Is that what you meant by "getting p-values from
> that"?

I admit that was a vague comment. For a p-value you need a null 
hypothesis, which you aim to reject. In the clustering context, you are 
probably more interested in cluster stability (CRAN package 'clue'), or 
model selection (how many clusters are there, and by which parameters 
are they characterised?). The flexmix or mclust packages on CRAN might 
be start points for that.

Hope this helps
	Wolfgang

>
>
>
>
> Le 10/05/12 21:04, Wolfgang Huber a écrit :
>>
>> Hi Marie
>>
>> Simon and you raised the point that comparing each of the five time
>> points (unreplicated) against control, and then presumably comparing
>> these lists (for what? overlap?) is likely suboptimal.
>>
>> While each time point does not have a replicate, if the biological
>> signal that you are interested in appears and disappears at rates
>> lower than the sampling time interval, you can still get an idea about
>> some of the variability in the data, e.g. by fitting a trend and
>> looking at the residuals. The first thing I would do here, in fact, is
>> to transform the data on a variance stabilised scale (with DESeq, as
>> described in the vignette), filter out all genes that show too small
>> variability overall, and then cluster the patterns. You don't directly
>> get p-values from that (though with some imagination that can be
>> done), but it might be a lot more informative than 5 lists.
>>
>> In any case, having a replicate of the time course seems essential for
>> reliable inference.
>>
>> Best wishes
>> Wolfgang
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Best wishes
	Wolfgang

Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioconductor mailing list