[BioC] cluster genes based on expression pattern
Stijn van Dongen
stijn at ebi.ac.uk
Fri Jun 17 10:30:01 CEST 2011
On Fri, Jun 17, 2011 at 10:49:08AM +1000, Moshe Olshansky wrote:
> Hi Rabe,
>
> You can check timecourse package (Bioconductor).
> Sean's suggestion to filter genes is always a good idea.
> My naive approach would be to define a sensible distance between two genes
> and use this distance for clustering (one possibility is hclust).
> To define a distance, suppose that you have two genes, A and B and n+1
> time points: 0,1,...,n. Let Ai and Bi be expression levels of genes A and
> B at time i (i=0,1,...,n). One possibility is just the Lp distance (for a
> suitable p). Another possibility is to say that we do not care about the
> absolute abundance but only about how it evolves in time and then we can
> look at AAi = Ai/A1 and BBi = Bi/B1, i=1,2,...,n and take some Lp (or
> other) distance between AA and BB.
> These are just some suggestions. You may think of another reasonable
> distance.
Another good choice is Pearson correlation or the absolute value
of Pearson correlation (in that case, anti-correlated genes
will cluster with correlated genes).
In our lab we have had good experiences with a network-based approach.
In this case one chooses a certain threshold, and only retains node-pairs
for which the (absolute) Pearson correlation falls above that threshold.
It is possible/advisable to vary such a threshold and look at graph
statistics such as average node degree and number of singletons to
get an idea for an appropriate threshold.
>From there on, any graph clustering can be used. We use MCL (developed
in our lab, so naturally). With MCL it pays to further transform the data,
but I will not elaborate here. Cei Abreu-Goodger and I have written
a book chapter on this subject, available for anyone interested.
regards,
Stijn
--
Stijn van Dongen >8< -o) O< forename pronunciation: [Stan]
EMBL-EBI /\\ Tel: +44-(0)1223-492675
Hinxton, Cambridge, CB10 1SD, UK _\_/ http://micans.org/stijn
More information about the Bioconductor
mailing list