[R] R 2.10.0: Error in gsub/calloc

Richard R. Liu richard.liu at pueo-owl.ch
Tue Nov 3 20:31:30 CET 2009


I apologize for not being clear.  d is a character vector of length  
158908.  Each element in the vector has been designated by sentDetect  
(package: openNLP) as a sentence.  Some of these are really  
sentences.  Others are merely groups of meaningless characters  
separated by white space.  strapply is a function in the package  
gosubfn.  It applies to each element of the first argument the regular  
expression (second argument).  Every match is then sent to the  
designated function (third argument, in my case missing, hence the  
identity function).  Thus, with strapply I am simply performing a  
white-space tokenization of each sentence.  I am doing this in the  
hope of being able to distinguish true sentences from false ones on  
the basis of mean length of token, maximum length of token, or similar.

Richard R. Liu
Dittingerstr. 33
CH-4053 Basel
Switzerland

Tel.:  +41 61 331 10 47
Email:  richard.liu at pueo-owl.ch


On Nov 3, 2009, at 18:30 , Uwe Ligges wrote:

>
>
> richard.liu at pueo-owl.ch wrote:
>> I'm running R 2.10.0 under Mac OS X 10.5.8; however, I don't think  
>> this
>> is a Mac-specific problem.
>> I have a very large (158,908 possible sentences, ca. 58 MB) plain  
>> text
>> document d which I am
>> trying to tokenize:  t <- strapply(d, "\\w+", perl = T).  I am
>> encountering the following error:
>
>
> What is strapply() and what is d?
>
> Uwe Ligges
>
>
>
>
>> Error in base::gsub(pattern, rs, x, ...) :
>>  Calloc could not allocate (-1398215180 of 1) memory
>> This happens regardless of whether I run in 32- or 64-bit mode.  The
>> machine has 8 GB of RAM, so
>> I can hardly believe that RAM is a problem.
>> Thanks,
>> Richard
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list