[R] Memory management in R

David Winsemius dwinsemius at comcast.net
Sat Oct 9 03:39:12 CEST 2010


On Oct 8, 2010, at 9:19 PM, Mike Marchywka wrote:
> ----------------------------------------
>> From: dwinsemius at comcast.net
>> To: lorenzo.isella at gmail.com
>> Date: Fri, 8 Oct 2010 19:30:45 -0400
>> CC: r-help at r-project.org
>> Subject: Re: [R] Memory management in R
>>
>>
>> On Oct 8, 2010, at 6:42 PM, Lorenzo Isella wrote:
>>
>
>>> Please find below the R snippet which requires an input file (a
>>> simple text file) you can download from
>>>
>>> http://dl.dropbox.com/u/5685598/time_series25_.dat
>>>
>>> What puzzles me is that the list is not really long (less than 2000
>>> entries) and I have not experienced the same problem even with
>>> longer lists.
>>
>> But maybe your loop terminated in them eaarlier/ Someplace between
>> 11*225 and 11*240 the grepping machine gives up:
>>
>>> eprs <- paste(rep("aaaaaaaaaa", 225), collapse="#")
>>> grepl(eprs, eprs)
>> [1] TRUE
>>
>>> eprs <- paste(rep("aaaaaaaaaa", 240), collapse="#")
>>> grepl(eprs, eprs)
>> Error in grepl(eprs, eprs) :
>> invalid regular expression
>> 'aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaa
>> In addition: Warning message:
>> In grepl(eprs, eprs) : regcomp error: 'Out of memory'
>>
>> The complexity of the problem may depend on the distribution of
>> values. You have a very skewed distribution with the vast majority
>> being in the same value as appeared in your error message :
>>
>
>>
>> HTH (although I think it means you need to construct a different
>> implementation strategy);
>
> You really need to look at the question posed by your regex and  
> consider
> the complexity of what you are asking and what likely implementations
> would do with your regex.

The R regex machine (at least on a Mac with R 2.11.1)  breaks when the  
length of the the pattern argument exceeds  2559 characters. There is  
no complexity  for the regex parser here. No metacharacters were in  
the string.

> Something like this probably needs to be implemented
> in dedicated code to handle the more general case or you need to  
> determine
> if input data is pathological given your regex.

There is a Biostrings package in BioC that may provide more robust  
treatment of long strings.

-- 
David.


> Being able to write something
> concisely doesn't mean the execution of that something is simple.  
> Even if
> it does manage to return a result, it likely will get very slow. In  
> the
> past I have had to write my own simple regex compilers to handle a  
> limited
> class of expressions to make the speed reasonable. In this case,  
> depending
> on your objectives, dedicated code may even be helpful to you in  
> understanding
> the algorithm.
>
>>
>> David.
>>
>>
>>> Many thanks
>>>
>>> Lorenzo
>>>
>
> 		 	   		

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list