[R] Performing Analysis on Subset of External data
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Oct 6 20:15:17 CEST 2004
1) Use the skip= and nrows= arguments to read.table.
2) Open a connection, read and discard rows, read the block you want then
close the connection. (Which is how 1 works, essentially.)
3) Use perl, awk or some such to extract the rows you want -- this is
probably rather faster.
On Wed, 6 Oct 2004, Laura Quinn wrote:
> I want to perform some analysis on subsets of huge data files. There are
> 20 of the files and I want to select the same subsets of each one (each
> subset is a chunk of 1500 or so consecutive rows from several million). To
> save time and processing power is there a method to tell R to *only* read
> in these rows, rather than reading in the entire dataset then selecting
> subsets and deleting the extraneous data? This method takes a rather silly
> amount of time and results in memory problems.
> I am using R 1.9.0 on SuSe 9.0
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help