[R] ideas about how to reduce RAM & improve speed in trying to use lapply(strsplit())

Matthew Keller mckellercran at gmail.com
Mon May 30 08:08:00 CEST 2011


God this listserve is awesome. Thanks to everyone for their ideas.
I'll speed & memory test tomorrow and change the code. Thanks again!

Matt

On Sun, May 29, 2011 at 6:44 PM, Ian Gow <iandgow at gmail.com> wrote:
> Not a new approach, but some benchmark data (the perl=TRUE speeds up Jim's
> suggestion):
>
>> x <- c('18x.6','12x.9','302x.3')
>> y <- rep(x,100000)
>> system.time(temp <- unlist(lapply(strsplit(y,".",fixed=TRUE),function(x)
>>x[1])))
>   user  system elapsed
>  1.203   0.018   1.222
>> system.time(temp2 <- gsub("^(.*?)\\..*$","\\1",y, perl=TRUE))
>   user  system elapsed
>  0.176   0.001   0.176
>> identical(temp2, temp)
> [1] TRUE
>> system.time(temp3 <- gsub("^(.*)\\..*", '\\1', y))
>   user  system elapsed
>  0.292   0.001   0.291
>> identical(temp3, temp)
> [1] TRUE
>> system.time(temp3 <- gsub("^(.*)\\..*", '\\1', y, perl=TRUE))
>   user  system elapsed
>  0.160   0.001   0.161
>
>
>
>
>
>
> On 5/29/11 7:40 PM, "jim holtman" <jholtman at gmail.com> wrote:
>
>>Try this approach:
>>
>>> x <- c('18x.6','12x.9','302x.3')
>>> gsub("^(.*)\\..*", '\\1', x)
>>[1] "18x"  "12x"  "302x"
>>
>>
>>On Sun, May 29, 2011 at 8:10 PM, Matthew Keller <mckellercran at gmail.com>
>>wrote:
>>> hi all,
>>>
>>> I'm full of questions today :). Thanks in advance for your help!
>>>
>>> Here's the problem:
>>> x <- c('18x.6','12x.9','302x.3')
>>>
>>> I want to get a vector that is c('18x','12x','302x')
>>>
>>> This is easily done using this code:
>>>
>>> unlist(lapply(strsplit(x,".",fixed=TRUE),function(x) x[1]))
>>>
>>> So far so good. The problem is that x is a vector of length 132e6.
>>> When I run the above code, it runs for > 30 minutes, and it takes > 23
>>> Gb RAM (no kidding!).
>>>
>>> Does anyone have ideas about how to speed up the code above and (more
>>> importantly) reduce the RAM footprint? I'd prefer not to change the
>>> file on disk using, e.g., awk, but I will do that as a last resort.
>>>
>>> Best
>>>
>>> Matt
>>>
>>> --
>>> Matthew C Keller
>>> Asst. Professor of Psychology
>>> University of Colorado at Boulder
>>> www.matthewckeller.com
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>>http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>>--
>>Jim Holtman
>>Data Munger Guru
>>
>>What is the problem that you are trying to solve?
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
>
>



-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com



More information about the R-help mailing list