[R] naive question

Tony Plate tplate at blackmesacapital.com
Wed Jun 30 17:34:25 CEST 2004


As far as I know, read.table() in S-plus performs similarly to read.table() 
in R with respect to speed.  So, I wouldn't put high hopes in finding much 
satisfaction there.

I do frequently read large tables in S-plus, and with a considerable amount 
of work was able to speed things up significantly, mainly by using scan() 
with appropriate arguments.  It's possible that some of the add-on modules 
for S-plus (e.g., the data-mining module) have faster I/O, but I haven't 
investigated those.  I get the best read performance out of S-plus by using 
a homegrown binary file format with each column stored in a contiguous 
block of memory and meta data (i.e., column types and dimensions) stored at 
the start of the file.  The S-plus read function reads the columns one at a 
time using readRaw(). One would be able to do something similar in R.  If 
you have to read from a text file, then, as others have suggested, writing 
a C program wouldn't be that hard, as long as you make the format inflexible.

-- Tony Plate

At Tuesday 06:19 PM 6/29/2004, Igor Rivin wrote:

>I was not particularly annoyed, just disappointed, since R seems like
>a much better thing than SAS in general, and doing everything with a 
>combination
>of hand-rolled tools is too much work. However, I do need to work with 
>very large data sets, and if it takes 20 minutes to read them in, I have 
>to explore other
>options (one of which might be S-PLUS, which claims scalability as a major
>, er, PLUS over R).
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




More information about the R-help mailing list