[R-sig-eco] Fwd: vegdist Error en double(N * (N - 1)/2) : tama?o del vector especificado es muy grande

Jari Oksanen jari.oksanen at oulu.fi
Tue Feb 12 09:58:56 CET 2013


Dear Carolina Bello,

You asked this same thing in the general R mailing list, and Brian Ripley answered to you on Saturday. The essential things he told you were that you cannot do that with 32G of RAM, and that you should rethink your problem. All we can do here is to repeat his message, and Tom Philippi already did so. 

With N = 138037 you need 71G to store the result, and 32 G of RAM is too little. I don't know how much further you would get with vegan:::vegdist in R 3.0.0 but at least the error message will change to a Spanish version of "Error: cannot allocate vector of size 71.0 Gb". 

You really should re-think your problem. You need to use methods that can handle large data sets like that or  you need to thin your data. Your data are modelled? At least I find it difficult to believe that you really have observations on 89 species in 138037 grid cells in rugged terrain like the Andes. 

Cheers, Jari Oksanen

On 12/02/2013, at 00:15 AM, Carolina Bello wrote:

> Hi
> I have some problems with the vegdist function.I want to do a hierarchical
> cluster from 138037 pixels of 1 lkm^2 from a study area of colombian Andes.
> I have distributions models for 89 species so i have a matrix with the
> pixels in the rows and species in the columns and is full with
> absence(0)/presence(1) of each species per each pixel. I think the bigger
> problem is that for agglomeration method in the hierarchical cluster i need
> the hole matrix so i can´t divided it.
> 
> For doing this I want to calculate a
> distance matrix with jaccard. I have binary data.
> 
> The problem is that i have a matrix of 138037 rows (sites) and 89 columns
> (species). my script is:
> 
>    rm(list=ls(all=T))
> 
>    gc() ##para borrar todo lo que quede oculto en memoria
> 
>    memory.limit(size = 100000) # it gives 1 Tera from HDD in case ram
> memory is over
> 
>    DF=as.data.frame(MODELOS)
> 
>    DF=na.omit(DF)
> 
>    DISTAN=vegdist(DF[,2:ncol(DF)],"jaccard")
> 
> Almost immediately IT produces the error:* Error en double(N * (N - 1)/2) :
> tamaño del vector especificado es muy grande*
> 
> I think this a memory error, but i don´t know why if i have a pc with 32GB
> of ram and 1 Tera of HDD.
> 
> I also try to do a dist matrix whit the function dist from package proxy, i
> did:
> 
>  library(proxy)
> 
>    vector=dist(DF, method = "Jaccard")
> 
> it starts to run but when it gets to 10 GB of ram, a window announces that R
> committed an error and it will close, so it closes and start a new section.
> 
> I really don't know what is going on and less how to solve this, can anybody
> help me?
> 
> thanks
> 
> Carolina Bello IAVH-COLOMBIA
> 
> 
> 
> 
> --
> View this message in context:
> http://r.789695.n4.nabble.com/vegdist-Error-en-double-N-N-1-2-tama-o-del-vector-especificado-es-muy-grande-tp4658010.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksanen at oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa



More information about the R-sig-ecology mailing list