[R] bottlenecks in R script

Gabor Grothendieck ggrothendieck at gmail.com
Tue Mar 16 18:28:37 CET 2010


Check out read.csv.sql in the sqldf package.  It reads a file directly
into sqlite without going through R and then from there into R.  It
sets up the database and file layouts in the database for you and also
destroys the database when finished so reading is just a matter of one
line of R code.  It also has the capability of reading any portion of
the file that can be specified in sql. See examples on home page:
http://sqldf.googlecode.com

On Tue, Mar 16, 2010 at 12:51 PM, Joe Calderon <calderon.joe at gmail.com> wrote:
> hello *, im running into two major bottlenecks an R script.
>
> 1. going through a 40mb file and reading in via readLines() 1 line at
> a time is almost an order of magnitude slow than the equivalent in
> python, im wondering if there are alternatives to readLines(), doing
> more lines at a time helps a bit
>
> 2. generating date sequences takes a long time, im basically doing
> something like seq.Date(Sys.Date(), length.out = 300, by ='day') a lot
> while digging into it, i strace'd the running process and it seems the
> bulk of the time is spent checking for /etc/localtime
>
> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>
>
> strace -cp 2964
> Process 2964 attached - interrupt to quit
> ^CProcess 2964 detached
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  94.61    0.006387           0     55872           stat
>  2.58    0.000174           0       568           read
>  1.42    0.000096           0       285           write
>  1.39    0.000094           1       137           brk
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    0.006751                 56862           total
>
>
>
> has anybody ran into similar problems?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list