[R] Can scan() detect end-of-file?
David Winsemius
dwinsemius at comcast.net
Fri Oct 16 01:42:44 CEST 2015
On Oct 15, 2015, at 3:10 PM, William Dunlap wrote:
> C can tell when it hits the end of input. Reading the lines with
> readLines and passing them to scan() does not help - it is the
> same as having scan read the original file.
>
> My problem is that the file (or other connection) has a variable number
> of fields on each "line", and perhaps no fields on some lines. Fields
> enclosed in quotes may include newline character. I want to read this
> file into a list of character vectors, the n'th element of the list being
> the fields on the n'th "line" of the file.
>
> repeating scan(connection, nlines=1, what="") does everything right
> except for telling me when it has read everything the connection
> has to offer. scan(connection, what="") manages to figure out where
> the end of the file is, but does not tell me the line number associated
> each character string.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Thu, Oct 15, 2015 at 2:57 PM, Jeff Newmiller
> <jdnewmil at dcn.davis.ca.us> wrote:
>> This is a problem in C as well... and the solution is to read the lines yourself and then give those lines to scan.
>> ---------------------------------------------------------------------------
>> Jeff Newmiller The ..... ..... Go Live...
>> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
>> Live: OO#.. Dead: OO#.. Playing
>> Research Engineer (Solar/Batteries O.O#. #.O#. with
>> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
>> ---------------------------------------------------------------------------
>> Sent from my phone. Please excuse my brevity.
>>
>> On October 15, 2015 1:16:58 PM PDT, William Dunlap <wdunlap at tibco.com> wrote:
>>> I would like to read a connection line by line with scan but
>>> don't know how to tell when to quit trying. Is there any
>>> way that you can ask the connection object if it is at the end?
>>>
>>> E.g.,
>>>
>>> t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n'
>>> tfile <- tempfile()
>>> cat(t, file=tfile)
>>> tcon <- file(tfile, "r") # or tcon <- textConnection(t)
>>> scan(tcon, what="", nlines=1)
>>> #Read 2 items
>>> #[1] "A" "Two line\nentry"
>>>> scan(tcon, what="", nlines=1) # empty line
>>> #Read 0 items
>>> #character(0)
>>> scan(tcon, what="", nlines=1)
>>> #Read 3 items
>>> #[1] "Three\nline\nentry" "D" "E"
>>> scan(tcon, what="", nlines=1) # end of file
>>> #Read 0 items
>>> #character(0)
>>> scan(tcon, what="", nlines=1) # end of file
>>> #Read 0 items
>>> #character(0)
If you run seek() after you scan() calls and test whether the the result is the same twice in a scan-read, that could be your end of file signal.
[1] "Three\nline\nentry" "D" "E"
[1] 43
> scan(tcon, what="", nlines=1);seek(tcon)
Read 0 items
character(0)
[1] 43
--
David.
>>>
>>> I am reading virtual line by virtual line because the lines
>>> may have different numbers of fields.
>>>
>>> Bill Dunlap
>>> TIBCO Software
--
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list