[Rd] R's IO speed

Roger D. Peng rpeng at jhsph.edu
Fri Dec 31 23:57:14 CET 2004

On a ~1.45 million row x 122 column data frame (one "character", one "factor", 
and the rest "numeric" columns) I can read it into R 2.0.1 using read.csv() in 
about 150 seconds; memory usage is ~1.5 GB.  This is read in using the `nrows', 
`comment.char = ""', and `colClasses' arguments.  On R-devel (2004-12-31), it 
takes about 120 seconds; memory usage is the same.   Not too shabby!


Prof Brian Ripley wrote:
> R-devel now has some improved versions of read.table and write.table.
> For a million-row data frame containing one number, one factor with few 
> levels and one logical column, a 56Mb object.
> generating it takes 4.5 secs.
> calling summary() on it takes 2.2 secs.
> writing it takes 8 secs and an additional 10Mb.
> saving it in .rda format takes 4 secs.
> reading it naively takes 28 secs and an additional 240Mb
> reading it carefully (using nrows, colClasses and comment.char) takes 16 
> secs and an additional 150Mb (56Mb of which is for the object read in).
> (The overhead of read.table over scan was about 2 secs, mainly in the 
> conversion back to a factor.)
> loading from .rda format takes 3.4 secs.
> [R 2.0.1 read in 23 secs using an additional 210Mb, and wrote in 50 secs 
> using an additional 450Mb.]
> Will Frank Harrell or someone else please explain to me a real 
> application in which this is not fast enough?

More information about the R-devel mailing list