[R] Suggestion for big files [was: Re: A comment about R:]
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Jan 6 09:08:59 CET 2006
[Just one point extracted: Hadley Wickham has answered the random sample
On Thu, 5 Jan 2006, François Pinard wrote:
> [Brian Ripley]
>> One problem with Francois Pinard's suggestion (the credit has got lost)
>> is that R's I/O is not line-oriented but stream-oriented. So selecting
>> lines is not particularly easy in R.
> I understand that you mean random access to lines, instead of random
> selection of lines. Once again, this chat comes out of reading someone
> else's problem, this is not a problem I actually have. SPSS was not
> randomly accessing lines, as data files could well be hold on magnetic
> tapes, where random access is not possible on average practice. SPSS
> reads (or was reading) lines sequentially from beginning to end, and the
> _random_ sample is built while the reading goes.
That was not my point. R's standard I/O is through connections, which
allow for pushbacks, changing line endings and re-encoding character sets.
That does add overhead compared to C/Fortran line-buffered reading of a
file. Skipping lines you do not need will take longer than you might
guess (based on some limited experience).
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help