[R] bottlenecks in R script
Gabor Grothendieck
ggrothendieck at gmail.com
Tue Mar 16 18:28:37 CET 2010
Check out read.csv.sql in the sqldf package. It reads a file directly
into sqlite without going through R and then from there into R. It
sets up the database and file layouts in the database for you and also
destroys the database when finished so reading is just a matter of one
line of R code. It also has the capability of reading any portion of
the file that can be specified in sql. See examples on home page:
http://sqldf.googlecode.com
On Tue, Mar 16, 2010 at 12:51 PM, Joe Calderon <calderon.joe at gmail.com> wrote:
> hello *, im running into two major bottlenecks an R script.
>
> 1. going through a 40mb file and reading in via readLines() 1 line at
> a time is almost an order of magnitude slow than the equivalent in
> python, im wondering if there are alternatives to readLines(), doing
> more lines at a time helps a bit
>
> 2. generating date sequences takes a long time, im basically doing
> something like seq.Date(Sys.Date(), length.out = 300, by ='day') a lot
> while digging into it, i strace'd the running process and it seems the
> bulk of the time is spent checking for /etc/localtime
>
> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>
>
> strace -cp 2964
> Process 2964 attached - interrupt to quit
> ^CProcess 2964 detached
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 94.61 0.006387 0 55872 stat
> 2.58 0.000174 0 568 read
> 1.42 0.000096 0 285 write
> 1.39 0.000094 1 137 brk
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 0.006751 56862 total
>
>
>
> has anybody ran into similar problems?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list