[BioC] HEATMAP on LARGE DATA

Antoine Lucas antoinelucas at libertysurf.fr
Wed Mar 15 09:48:54 CET 2006


Hi,

For large data sets, hcluster will requires twice less memory than hclust (package amap).

For even larger data sets, you can use xcluster program from Gavin Sherlock
http://genetics. stanford.edu/~sherlock/cluster.html

Package ctc has all tools dialog with this [free] software.

And for visualization, I recommend TreeView or Freeview
http://magix.fri.uni-lj.si/freeview

But exploration on very large tree should be analysed carefully as each branch could be switch with another one like that:

--- A  == --- A
 +- B      +- C
  + C       + B

Regards,

	Antoine Lucas.



Le Mon, 13 Mar 2006 22:22:53 -0500
Sean Davis <sdavis2 at mail.nih.gov> a écrit:

> 
> 
> 
> On 3/13/06 21:08, "mark salsburg" <mark.salsburg at gmail.com> wrote:
> 
> > I am having trouble getting the function heatmap() to work on the following
> > gene expression
> > 
> >> dim(SAMPLES_log)
> > [1] 12626    20
> > 
> > 
> >            sample1 sample2...................sample20
> > gen1
> > gen2
> > gen3
> > ....
> > gen12626
> > 
> > 
> > 
> > I have converted SAMPLES_log to a numeric matrix using:
> > 
> > as.matrix(SAMPLES_log)
> > 
> > when I use the following command:
> > 
> > heatmap(SAMPLES_log)
> > 
> > Error: cannot allocate vector of size 622668 Kb
> > In addition: Warning messages:
> > 1: Reached total allocation of 1022Mb: see help(memory.size)
> > 2: Reached total allocation of 1022Mb: see help(memory.size)
> 
> Mark,
> 
> In order to do a heatmap on 12000 genes, a triangular matrix of size
> 12000x12000/2 needs to be calculated.  This is large and will often result
> in the out-of-memory error that you see.  I don't often find that clustering
> that many genes is meaningful in any major way, particularly since you will
> be including a large number of genes that do not vary in the samples.  If
> you really need to do this, I would suggest that you use an external program
> like cluster/treeview, as they may be somewhat less memory-hungry than R
> (but I haven't tested that directly).
> 
> > Is there some library in BioConductor that will allow me to output a
> > heatmap. I want to compare the expression of the first 10 samples with the
> > last 10 samples.
> 
> If you want to do an unsupervised clustering of samples, use just hclust.
> 
> If you want to do an unsupervised clustering of samples AND genes, I would
> suggest reducing the number of genes using a filter for genes that show
> variability (by using, say, the top 500 genes when sorted by coefficient of
> variation, for example).  In other words, there is no need to include a gene
> in a heatmap that is the same for all samples.
> 
> Ultimately, though, if you want to compare gene expression in two groups of
> samples, you are asking a question that is best answered using a supervised
> method, like a t-test.  There are numerous ways to do a t-test between two
> groups including the limma, siggenes, and multtest packages.
> 
> Hope that helps.
> 
> Sean
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> 


-- 
Antoine Lucas
Centre de génétique Moléculaire, CNRS
91198 Gif sur Yvette Cedex
Tel: (33)1 69 82 38 89
Fax: (33)1 69 82 38 77



More information about the Bioconductor mailing list