[R] Memory management in R

Lorenzo Isella lorenzo.isella at gmail.com
Sat Oct 9 00:42:53 CEST 2010


Thanks for lending a helping hand.
I put together a self-contained example. Basically, it all relies on a 
couple of functions, where one function simply iterates the application 
of the other function.
I am trying to implement the so-called Lempel-Ziv entropy estimator. The 
idea is to choose a position i along a string x (standing for a time 
series) and find the length of the shortest string starting from i which 
has never occurred before i.
Please find below the R snippet which requires an input file (a simple 
text file) you can download from

http://dl.dropbox.com/u/5685598/time_series25_.dat

What puzzles me is that the list is not really long (less than 2000 
entries) and I have not experienced the same problem even with longer lists.
Many thanks

Lorenzo

######################################


total_entropy_lz <- function(x){

if (length(x)==1){

print("sequence too short")

return("error")

} else{


n <- length(x)

prefactor <- 1/(n*log(n)/log(2))

n_seq <- seq(n)

entropy_list <- n_seq

for (i in n_seq){

entropy_list[i] <- entropy_lz(x,i)


}


}

total_entropy <- 1/(prefactor*sum(entropy_list))


return(total_entropy)

}


entropy_lz <- function(x,i){

past <- x[1:i-1]

n <- length(x)

lp <- length(past)

future <- x[i:n]

go_on <- 1

count_len <- 0

past_string <- paste(past, collapse="#")

while (go_on>0){

new_seq <- x[i:(i+count_len)]

fut_string <- paste(new_seq, collapse="#")

count_len <- count_len+1

if (grepl(fut_string,past_string)!=1){

go_on <- -1
}
}
return(count_len)
}

x <- scan("time_series25_.dat", what="")


S <- total_entropy_lz(x)






On 10/08/2010 07:30 PM, jim holtman wrote:
> More specificity: how long is the string, what is the pattern you are
> matching against?  It sounds like you might have a complex pattern
> that in trying to match the string might be doing a lot of back
> tracking and such.  There is an O'Reilly book on Mastering Regular
> Expression that might help you understand what might be happening.  So
> if you can provide a better example than just the error message, it
> would be helpful.
>
> On Fri, Oct 8, 2010 at 1:11 PM, Lorenzo Isella<lorenzo.isella at gmail.com>  wrote:
>> Dear All,
>> I am experiencing some problems with a script of mine.
>> It crashes with this message
>>
>> Error in grepl(fut_string, past_string) :
>>   invalid regular expression
>> '12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12
>> Calls: entropy_estimate_hash ->  total_entropy_lz ->  entropy_lz ->  grepl
>> In addition: Warning message:
>> In grepl(fut_string, past_string) : regcomp error:  'Out of memory'
>> Execution halted
>>
>> To make a long story short, I use some functions which eventually call grepl
>> on very long strings to check whether a certain substring is part of a
>> longer string.
>> Now, the script technically works (it never crashes when I run it on a
>> smaller dataset) and the problem does not seem to be RAM memory (I have
>> several GB of RAM on my machine and its consumption never shoots up so my
>> machine never resorts to swap memory).
>> So (though I am not an expert) it looks like the problem is some limitation
>> of grepl or R memory management.
>> Any idea about how I could tackle this problem or how I can profile my code
>> to fix it (though it really seems to me that I have to find a way to allow R
>> to process longer strings).
>> Any suggestion is appreciated.
>> Cheers
>>
>> Lorenzo
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>



More information about the R-help mailing list