[R] Can scan() detect end-of-file?

David Winsemius dwinsemius at comcast.net
Fri Oct 16 01:42:44 CEST 2015


On Oct 15, 2015, at 3:10 PM, William Dunlap wrote:

> C can tell when it hits the end of input.  Reading the lines with
> readLines and passing them to scan() does not help - it is the
> same as having scan read the original file.
> 
> My problem is that the file (or other connection) has a variable number
> of fields on each "line", and perhaps no fields on some lines.  Fields
> enclosed in quotes may include newline character.  I want to read this
> file into a list of character vectors, the n'th element of the list being
> the fields on the n'th "line" of the file.
> 
> repeating scan(connection, nlines=1, what="") does everything right
> except for telling me when it has read everything the connection
> has to offer.  scan(connection, what="") manages to figure out where
> the end of the file is, but does not tell me the line number associated

> each character string.
> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> 
> 
> On Thu, Oct 15, 2015 at 2:57 PM, Jeff Newmiller
> <jdnewmil at dcn.davis.ca.us> wrote:
>> This is a problem in C as well... and the solution is to read the lines yourself and then give those lines to scan.
>> ---------------------------------------------------------------------------
>> Jeff Newmiller                        The     .....       .....  Go Live...
>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>>                                      Live:   OO#.. Dead: OO#..  Playing
>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
>> ---------------------------------------------------------------------------
>> Sent from my phone. Please excuse my brevity.
>> 
>> On October 15, 2015 1:16:58 PM PDT, William Dunlap <wdunlap at tibco.com> wrote:
>>> I would like to read a connection line by line with scan but
>>> don't know how to tell when to quit trying.  Is there any
>>> way that you can ask the connection object if it is at the end?
>>> 
>>> E.g.,
>>> 
>>> t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n'
>>> tfile <- tempfile()
>>> cat(t, file=tfile)
>>> tcon <- file(tfile, "r") # or tcon <- textConnection(t)
>>> scan(tcon, what="", nlines=1)
>>> #Read 2 items
>>> #[1] "A"               "Two line\nentry"
>>>> scan(tcon, what="", nlines=1)  # empty line
>>> #Read 0 items
>>> #character(0)
>>> scan(tcon, what="", nlines=1)
>>> #Read 3 items
>>> #[1] "Three\nline\nentry" "D"                  "E"
>>> scan(tcon, what="", nlines=1) # end of file
>>> #Read 0 items
>>> #character(0)
>>> scan(tcon, what="", nlines=1) # end of file
>>> #Read 0 items
>>> #character(0)

If you run seek() after you scan() calls and test whether  the the result is the same twice in a scan-read, that could be your end of file signal.

[1] "Three\nline\nentry" "D"                  "E"                 
[1] 43
> scan(tcon, what="", nlines=1);seek(tcon)
Read 0 items
character(0)
[1] 43



-- 
David.
>>> 
>>> I am reading virtual line by virtual line because the lines
>>> may have different numbers of fields.
>>> 
>>> Bill Dunlap
>>> TIBCO Software

-- 

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list