[R] read in large data file (tsv) with inline filter?
debeaudette at ucdavis.edu
Mon Mar 23 23:09:36 CET 2009
On Monday 23 March 2009, David Reiss wrote:
> I have a very large tab-delimited file, too big to store in memory via
> readLines() or read.delim(). Turns out I only need a few hundred of those
> lines to be read in. If it were not so large, I could read the entire file
> in and "grep" the lines I need. For such a large file; many calls to
> read.delim() with incrementing "skip" and "nrows" parameters, followed by
> grep() calls is very slow. I am aware of possibilities via SQLite; I would
> prefer to not use that in this case.
> My question is...Is there a function for efficiently reading in a file
> along the lines of read.delim(), which allows me to specify a filter (via
> grep or something else) that tells the function to only read in certain
> lines that match?
> If not, I would *love* to see a "filter" parameter added as an option to
> read.delim() and/or readLines().
> thanks for any pointers.
How about pre-filtering before loading the data into R:
grep -E 'your pattern here' your_file_here > your_filtered_file
alternatively if you need to search in fields, see 'awk', and 'cut', or if you
need to delete things see 'tr'.
These tools come with any unix-like OS, and you can probably get them on
windows without much effort.
Soil Resource Laboratory
University of California at Davis
More information about the R-help