[Rd] read.csv reads more rows than indicated by wc -l
Matthew Dowle
mdowle at mdowle.plus.com
Fri Dec 21 00:46:39 CET 2012
Ben,
> Somewhere on my wish/TO DO list is for someone to rewrite read.table
> for
> better robustness *and* efficiency ...
Wish granted. New in data.table 1.8.7 :
=====
New function fread(), a fast and friendly file reader.
* header, skip, nrows, sep and colClasses are all auto detected.
* integers>2^31 are detected and read natively as bit64::integer64.
* accepts filenames, URLs and "A,B\n1,2\n3,4" directly
* new implementation entirely in C
* with a 50MB .csv, 1 million rows x 6 columns :
read.csv("test.csv") #
30-60 sec
read.table("test.csv",<all known tricks and known nrows>) #
10 sec
fread("test.csv") #
3 sec
* airline data: 658MB csv (7 million rows x 29 columns)
read.table("2008.csv",<all known tricks and known nrows>) #
360 sec
fread("2008.csv") #
50 sec
See ?fread. Many thanks to Chris Neff and Garrett See for ideas,
discussions
and beta testing.
=====
The help page ?fread is fairly well developed :
https://r-forge.r-project.org/scm/viewvc.php/pkg/man/fread.Rd?view=markup&root=datatable
Comments, feedback and bug reports very welcome.
Matthew
http://datatable.r-forge.r-project.org/
More information about the R-devel
mailing list