[R] Memory management in R
David Winsemius
dwinsemius at comcast.net
Sat Oct 9 03:39:12 CEST 2010
On Oct 8, 2010, at 9:19 PM, Mike Marchywka wrote:
> ----------------------------------------
>> From: dwinsemius at comcast.net
>> To: lorenzo.isella at gmail.com
>> Date: Fri, 8 Oct 2010 19:30:45 -0400
>> CC: r-help at r-project.org
>> Subject: Re: [R] Memory management in R
>>
>>
>> On Oct 8, 2010, at 6:42 PM, Lorenzo Isella wrote:
>>
>
>>> Please find below the R snippet which requires an input file (a
>>> simple text file) you can download from
>>>
>>> http://dl.dropbox.com/u/5685598/time_series25_.dat
>>>
>>> What puzzles me is that the list is not really long (less than 2000
>>> entries) and I have not experienced the same problem even with
>>> longer lists.
>>
>> But maybe your loop terminated in them eaarlier/ Someplace between
>> 11*225 and 11*240 the grepping machine gives up:
>>
>>> eprs <- paste(rep("aaaaaaaaaa", 225), collapse="#")
>>> grepl(eprs, eprs)
>> [1] TRUE
>>
>>> eprs <- paste(rep("aaaaaaaaaa", 240), collapse="#")
>>> grepl(eprs, eprs)
>> Error in grepl(eprs, eprs) :
>> invalid regular expression
>> 'aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaaaaaaa#aaaaa
>> In addition: Warning message:
>> In grepl(eprs, eprs) : regcomp error: 'Out of memory'
>>
>> The complexity of the problem may depend on the distribution of
>> values. You have a very skewed distribution with the vast majority
>> being in the same value as appeared in your error message :
>>
>
>>
>> HTH (although I think it means you need to construct a different
>> implementation strategy);
>
> You really need to look at the question posed by your regex and
> consider
> the complexity of what you are asking and what likely implementations
> would do with your regex.
The R regex machine (at least on a Mac with R 2.11.1) breaks when the
length of the the pattern argument exceeds 2559 characters. There is
no complexity for the regex parser here. No metacharacters were in
the string.
> Something like this probably needs to be implemented
> in dedicated code to handle the more general case or you need to
> determine
> if input data is pathological given your regex.
There is a Biostrings package in BioC that may provide more robust
treatment of long strings.
--
David.
> Being able to write something
> concisely doesn't mean the execution of that something is simple.
> Even if
> it does manage to return a result, it likely will get very slow. In
> the
> past I have had to write my own simple regex compilers to handle a
> limited
> class of expressions to make the speed reasonable. In this case,
> depending
> on your objectives, dedicated code may even be helpful to you in
> understanding
> the algorithm.
>
>>
>> David.
>>
>>
>>> Many thanks
>>>
>>> Lorenzo
>>>
>
>
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list