[R] R tools for large files

Richard A. O'Keefe ok at cs.otago.ac.nz
Mon Aug 25 08:09:26 CEST 2003


Murray Jorgensen <maj at stats.waikato.ac.nz> wrote:
	I'm wondering if anyone has written some functions or code for handling 
	very large files in R. I am working with a data file that is 41 
	variables times who knows how many observations making up 27MB altogether.
	
Does that really count as "very large"?
I tried making a file where each line was
"1 2 3 .... 39 40 41"
With 240,000 lines it came to 27.36 million bytes.
You can *hold* that amount of data in R quite easily.
The problem is the time it takes to read it using scan() or read.table().

	The sort of thing that I am thinking of having R do is
	
	- count the number of lines in a file
	
	- form a data frame by selecting all cases whose line numbers are in a 
	supplied vector (which could be used to extract random subfiles of 
	particular sizes)
	
	Does anyone know of a package that might be useful for this?
	
There's a Unix program I posted to comp.sources years ago called "sample":
    sample -(how many) <(where from)
selects the given number of lines without replacement its standard input
and writes them in random order to its standard output.  Hook it up to a
decent random number generator and you're pretty much done: read.table()
and scan() can read from a pipe.




More information about the R-help mailing list