[R] Memory management in R
Lorenzo Isella
lorenzo.isella at gmail.com
Sat Oct 9 00:42:53 CEST 2010
Thanks for lending a helping hand.
I put together a self-contained example. Basically, it all relies on a
couple of functions, where one function simply iterates the application
of the other function.
I am trying to implement the so-called Lempel-Ziv entropy estimator. The
idea is to choose a position i along a string x (standing for a time
series) and find the length of the shortest string starting from i which
has never occurred before i.
Please find below the R snippet which requires an input file (a simple
text file) you can download from
http://dl.dropbox.com/u/5685598/time_series25_.dat
What puzzles me is that the list is not really long (less than 2000
entries) and I have not experienced the same problem even with longer lists.
Many thanks
Lorenzo
######################################
total_entropy_lz <- function(x){
if (length(x)==1){
print("sequence too short")
return("error")
} else{
n <- length(x)
prefactor <- 1/(n*log(n)/log(2))
n_seq <- seq(n)
entropy_list <- n_seq
for (i in n_seq){
entropy_list[i] <- entropy_lz(x,i)
}
}
total_entropy <- 1/(prefactor*sum(entropy_list))
return(total_entropy)
}
entropy_lz <- function(x,i){
past <- x[1:i-1]
n <- length(x)
lp <- length(past)
future <- x[i:n]
go_on <- 1
count_len <- 0
past_string <- paste(past, collapse="#")
while (go_on>0){
new_seq <- x[i:(i+count_len)]
fut_string <- paste(new_seq, collapse="#")
count_len <- count_len+1
if (grepl(fut_string,past_string)!=1){
go_on <- -1
}
}
return(count_len)
}
x <- scan("time_series25_.dat", what="")
S <- total_entropy_lz(x)
On 10/08/2010 07:30 PM, jim holtman wrote:
> More specificity: how long is the string, what is the pattern you are
> matching against? It sounds like you might have a complex pattern
> that in trying to match the string might be doing a lot of back
> tracking and such. There is an O'Reilly book on Mastering Regular
> Expression that might help you understand what might be happening. So
> if you can provide a better example than just the error message, it
> would be helpful.
>
> On Fri, Oct 8, 2010 at 1:11 PM, Lorenzo Isella<lorenzo.isella at gmail.com> wrote:
>> Dear All,
>> I am experiencing some problems with a script of mine.
>> It crashes with this message
>>
>> Error in grepl(fut_string, past_string) :
>> invalid regular expression
>> '12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12
>> Calls: entropy_estimate_hash -> total_entropy_lz -> entropy_lz -> grepl
>> In addition: Warning message:
>> In grepl(fut_string, past_string) : regcomp error: 'Out of memory'
>> Execution halted
>>
>> To make a long story short, I use some functions which eventually call grepl
>> on very long strings to check whether a certain substring is part of a
>> longer string.
>> Now, the script technically works (it never crashes when I run it on a
>> smaller dataset) and the problem does not seem to be RAM memory (I have
>> several GB of RAM on my machine and its consumption never shoots up so my
>> machine never resorts to swap memory).
>> So (though I am not an expert) it looks like the problem is some limitation
>> of grepl or R memory management.
>> Any idea about how I could tackle this problem or how I can profile my code
>> to fix it (though it really seems to me that I have to find a way to allow R
>> to process longer strings).
>> Any suggestion is appreciated.
>> Cheers
>>
>> Lorenzo
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
More information about the R-help
mailing list