[Rd] R's IO speed
Roger D. Peng
rpeng at jhsph.edu
Fri Dec 31 23:57:14 CET 2004
On a ~1.45 million row x 122 column data frame (one "character", one "factor",
and the rest "numeric" columns) I can read it into R 2.0.1 using read.csv() in
about 150 seconds; memory usage is ~1.5 GB. This is read in using the `nrows',
`comment.char = ""', and `colClasses' arguments. On R-devel (2004-12-31), it
takes about 120 seconds; memory usage is the same. Not too shabby!
Prof Brian Ripley wrote:
> R-devel now has some improved versions of read.table and write.table.
> For a million-row data frame containing one number, one factor with few
> levels and one logical column, a 56Mb object.
> generating it takes 4.5 secs.
> calling summary() on it takes 2.2 secs.
> writing it takes 8 secs and an additional 10Mb.
> saving it in .rda format takes 4 secs.
> reading it naively takes 28 secs and an additional 240Mb
> reading it carefully (using nrows, colClasses and comment.char) takes 16
> secs and an additional 150Mb (56Mb of which is for the object read in).
> (The overhead of read.table over scan was about 2 secs, mainly in the
> conversion back to a factor.)
> loading from .rda format takes 3.4 secs.
> [R 2.0.1 read in 23 secs using an additional 210Mb, and wrote in 50 secs
> using an additional 450Mb.]
> Will Frank Harrell or someone else please explain to me a real
> application in which this is not fast enough?
More information about the R-devel