[R] Reading large files quickly
Jakson Alves de Aquino
jaksonaquino at gmail.com
Sun May 10 03:19:33 CEST 2009
Rob Steele wrote:
> I'm finding that readLines() and read.fwf() take nearly two hours to
> work through a 3.5 GB file, even when reading in large (100 MB) chunks.
> The unix command wc by contrast processes the same file in three
> minutes. Is there a faster way to read files in R?
I use statist to convert the fixed width data file into a csv file
because read.table() is considerably faster than read.fwf(). For example:
system("statist --na-string NA --xcols collist big.txt big.csv")
bigdf <- read.table(file = "big.csv", header=T, as.is=T)
The file collist is a text file whose lines contain the following
variable begin end
where "variable" is the column name, and "begin" and "end" are integer
numbers indicating where in big.txt the columns begin and end.
Statist can be downloaded from: http://statist.wald.intevation.org/
Social Sciences Department
Federal University of Ceará, Brazil
More information about the R-help