[R] bottlenecks in R script

Joe Calderon calderon.joe at gmail.com
Tue Mar 16 17:51:21 CET 2010


hello *, im running into two major bottlenecks an R script.

1. going through a 40mb file and reading in via readLines() 1 line at
a time is almost an order of magnitude slow than the equivalent in
python, im wondering if there are alternatives to readLines(), doing
more lines at a time helps a bit

2. generating date sequences takes a long time, im basically doing
something like seq.Date(Sys.Date(), length.out = 300, by ='day') a lot
while digging into it, i strace'd the running process and it seems the
bulk of the time is spent checking for /etc/localtime

stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0


strace -cp 2964
Process 2964 attached - interrupt to quit
^CProcess 2964 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 94.61    0.006387           0     55872           stat
  2.58    0.000174           0       568           read
  1.42    0.000096           0       285           write
  1.39    0.000094           1       137           brk
------ ----------- ----------- --------- --------- ----------------
100.00    0.006751                 56862           total



has anybody ran into similar problems?



More information about the R-help mailing list