[R] hclust() memory issue

Thomas Lumley tlumley at uw.edu
Mon Mar 14 23:35:52 CET 2011


The nnclust package will compute the minimum spanning tree, from which
you can extract hierarchical single-linkage clustering.

For N randomly-ordered observations it uses only NlogN memory, and
takes N^2 time in high dimensions (30 is high) but only NlogN in low
dimensions.

     -thomas

On Tue, Mar 15, 2011 at 11:11 AM, array chip <arrayprofile at yahoo.com> wrote:
> Scott, thanks for the suggestion. I have already filtered genes from more than
> 30000. Probably I should filter more. I will take a look at genefilter package.
>
> John
>
>
>
>
> ________________________________
> From: "Ochsner, Scott A" <sochsner at bcm.edu>
>
> <r-help at r-project.org>
> Sent: Mon, March 14, 2011 2:19:57 PM
> Subject: RE: [R] hclust() memory issue
>
> John,
>
> First, why are you trying to cluster so many rows?  Presumably, if this is a
> gene expression array dataset, most of the array features are not going to
> change across treatments/conditions and will be relatively uninformative.  Try
> using a filter which does not use treatment/condition information to decrease
> the number or array features you are attempting to cluster.  There are numerous
> examples in the affycoretools and genefilter packages from Bioconductor
> http://www.bioconductor.org/.
>
> HTH,
>
> Scott
>
>
> Scott A. Ochsner, PhD
> One Baylor Plaza BCM130, Houston, TX 77030
> Voice: (713) 798-6227  Fax: (713) 790-1275
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf Of array chip
> Sent: Monday, March 14, 2011 4:03 PM
> To: r-help at r-project.org
> Subject: [R] hclust() memory issue
>
> Hi, I have a microarray dataset of dimension 25000x30 and try to clustering
> using hclust(). But the clustering on the rows failed due to the size:
>
>> y<-hclust(dist(data),method='average')
> Error: cannot allocate vector of size 1.9 Gb
>
> I tried to increase the memory using memory.limit(size=3000), still got the same
>
> error.
>
> I also tried agnes() from cluster package and pvclust() from pvclust package
> without success.
>
> My computer has 2G memory. Is there a more memory efficient clustering packages
> available?
>
> Thanks
>
> John
>
>
>> sessionInfo()
> R version 2.11.1 (2010-05-31)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252    LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252
>
>
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] pvclust_1.2-1     cluster_1.13.1    rat2302cdf_2.6.0  simpleaffy_2.24.0
> gcrma_2.20.0      genefilter_1.30.0 affy_1.26.1
>
> [8] Biobase_2.8.0
>
> loaded via a namespace (and not attached):
> [1] affyio_1.16.0         annotate_1.26.1       AnnotationDbi_1.10.2
> Biostrings_2.16.9     DBI_0.2-5             IRanges_1.6.16
>
> [7] preprocessCore_1.10.0 RSQLite_0.9-2         splines_2.11.1
> survival_2.35-8       tools_2.11.1          xtable_1.5-6
>
>
>
>    [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland



More information about the R-help mailing list