[R] ideas about how to reduce RAM & improve speed in trying to use lapply(strsplit())

Matthew Keller mckellercran at gmail.com
Mon May 30 02:10:14 CEST 2011


hi all,

I'm full of questions today :). Thanks in advance for your help!

Here's the problem:
x <- c('18x.6','12x.9','302x.3')

I want to get a vector that is c('18x','12x','302x')

This is easily done using this code:

unlist(lapply(strsplit(x,".",fixed=TRUE),function(x) x[1]))

So far so good. The problem is that x is a vector of length 132e6.
When I run the above code, it runs for > 30 minutes, and it takes > 23
Gb RAM (no kidding!).

Does anyone have ideas about how to speed up the code above and (more
importantly) reduce the RAM footprint? I'd prefer not to change the
file on disk using, e.g., awk, but I will do that as a last resort.

Best

Matt

-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com



More information about the R-help mailing list