[R] R tools for large files
Richard A. O'Keefe
ok at cs.otago.ac.nz
Mon Aug 25 08:09:26 CEST 2003
Murray Jorgensen <maj at stats.waikato.ac.nz> wrote:
I'm wondering if anyone has written some functions or code for handling
very large files in R. I am working with a data file that is 41
variables times who knows how many observations making up 27MB altogether.
Does that really count as "very large"?
I tried making a file where each line was
"1 2 3 .... 39 40 41"
With 240,000 lines it came to 27.36 million bytes.
You can *hold* that amount of data in R quite easily.
The problem is the time it takes to read it using scan() or read.table().
The sort of thing that I am thinking of having R do is
- count the number of lines in a file
- form a data frame by selecting all cases whose line numbers are in a
supplied vector (which could be used to extract random subfiles of
particular sizes)
Does anyone know of a package that might be useful for this?
There's a Unix program I posted to comp.sources years ago called "sample":
sample -(how many) <(where from)
selects the given number of lines without replacement its standard input
and writes them in random order to its standard output. Hook it up to a
decent random number generator and you're pretty much done: read.table()
and scan() can read from a pipe.
More information about the R-help
mailing list