[R] Scanning data files line-by-line
R A F
raf1729 at hotmail.com
Wed Apr 30 17:21:23 CEST 2003
Hi all, thanks to everyone again for helping out. I don't want to
generate too many messages, but this problem seems common enough that
maybe it's worth a summary.
What I can do is this. Let's say "file" has lines of double, string,
double with variable number of spaces between fields followed by EOF.
aaa <- file( "file", "r" )
while( length( ( x <- scan( aaa, nlines = 1, list( 0, "", 0 ) ) )[] )
> 0 )
check to see if x is empty again (by length( x[] ) > 0 ) since
we would read in the EOF character into x still
if not empty
close( aaa )
Here x is a list and x[] is the first field, etc.
Professor Ripley also suggested textConnections, but I didn't
experiment -- I'm usually happy to find something that works. :-)
>From: Spencer Graves <spencer.graves at pdf.com>
>To: Prof Brian Ripley <ripley at stats.ox.ac.uk>
>CC: R-help at stat.math.ethz.ch, R A F <raf1729 at hotmail.com>
>Subject: Re: [R] Scanning data files line-by-line
>Date: Wed, 30 Apr 2003 07:28:03 -0700
>With a "connection" instead of a "file", there is no counterpart to
>"count.fields" to summarize what's available?
>Prof Brian Ripley wrote:
>>On Wed, 30 Apr 2003, R A F wrote:
>>>Thanks very much. I guess the answer leads to more questions:
>>>(a) What if I don't know the number of lines? So I would like to use
>>> a while loop until readLines hits an EOF character. Would that
>>> be possible?
>>Yes. After you reach the end of the file you will get character(0) since
>> A character vector of length the number of lines read.
>>and zero lines would have been read.
>>>(b) When readLines is used, a string is returned.
>>Not quite: a character vector is returned.
>>>I'd like to split
>>> the string into fields, and Andy Liaw suggested strsplit, but the
>>> number of spaces between fields is variable. So for example, one
>>> line could be 1 space 2 space space 3 and the next line could be
>>> 4 space space 5 space 6, so I could not do a strsplit using " ".
>>> Really what I know is the variable type of each field -- for
>>> example, each line is double, string, then double, etc. How
>>> would one use this information to split the string given by
>>You could use scan on the line: it works on textConnections.
>>>Thanks very much again!
More information about the R-help