[BioC] DESeq2 Regularised Log for Clustering of Genes
Simon Anders
anders at embl.de
Fri May 9 14:19:37 CEST 2014
Hi Dario
On Wed, May 7, 2014 at 11:00 PM, Dario Strbenac wrote:
>> As section 5.3 of the vignette explains, the transformed data can
>> be used for applications like clustering of samples. I was
>> considering the best way to use it instead for clustering genes of
>> a time-series experiment. I would have to account for gene length
>> to make different genes comparable.
Actually, no. I don't think accounting for gene length is necessary.
It depends on your distance metric: Do you want to consider two genes as
similar (and hence would want them to cluster together) if they have
similar absolute expression strength, or rather if they have a similar
profile of _changes_ during the time course?
I would expect that the latter is more helpful for analysing time-course
data, and that you will hence get biologically more meaningful clusters
if you normalize each gene's expression by its expression strength at
time 0. At the natural scale, this means division by, and at the log
scale, subtraction of the time-0 (or: control) value. In either case,
gene length cancels out.
This also means that, in case of a design with replicates or with
factors besides time point, it might be preferable to not use DESeq2's
rlog transform, but rather use DESeq2's normal wrokflow to estimate
shrunken log fold changes for contrasts of all later time points against
zero time and then perform clustering on these values. (Thinking about
it, we should maybe consider adding a section in the vignette to
demonstrate this approach.)
Simon
More information about the Bioconductor
mailing list