[R] Performing Analysis on Subset of External data

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Oct 6 20:15:17 CEST 2004


1) Use the skip= and nrows= arguments to read.table.

2) Open a connection, read and discard rows, read the block you want then 
close the connection. (Which is how 1 works, essentially.)

3) Use perl, awk or some such to extract the rows you want -- this is 
probably rather faster.

On Wed, 6 Oct 2004, Laura Quinn wrote:

> I want to perform some analysis on subsets of huge data files. There are
> 20 of the files and I want to select the same subsets of each one (each
> subset is a chunk of 1500 or so consecutive rows from several million). To
> save time and processing power is there a method to tell R to *only* read
> in these rows, rather than reading in the entire dataset then selecting
> subsets and deleting the extraneous data? This method takes a rather silly
> amount of time and results in memory problems.
> 
> I am using R 1.9.0 on SuSe 9.0


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list