[R] Can scan() detect end-of-file?

William Dunlap wdunlap at tibco.com
Thu Oct 15 22:56:37 CEST 2015


readLines() does not work for me since it breaks up
multiline fields that are enclosed in quotes.  E.g., the
text file line
  A "Two line\nentry"
should be imported as 2 strings, the second being
"Two line\nfield", not "\"Two line" with the next call to
readLines bringing in "fentry\"".

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Oct 15, 2015 at 1:44 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
> I've always used system("wc -l myfile") to get the number of lines in
> advance. But here are two other R-only options, both using readLines
> instead of scan. There's probably something more efficient, too.
>
> Your setup:
> t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n'
> tfile <- tempfile()
> cat(t, file=tfile)
> tcon <- file(tfile, "r") # or tcon <- textConnection(t)
>
> readLines() produces character(0) for nonexistent lines and "" for empty lines.
>
>> readLines(tcon, n=1)
> [1] "A \"Two line"
>> readLines(tcon, n=1)
> [1] "entry\""
>> readLines(tcon, n=1)
> [1] ""
>> readLines(tcon, n=1)
> [1] "\"Three"
>> readLines(tcon, n=1)
> [1] "line"
>> readLines(tcon, n=1)
> [1] "entry\" D E"
>> readLines(tcon, n=1)
> character(0)
>> readLines(tcon, n=1)
> character(0)
>
> Or if the file isn't too large for memory, you can read the whole
> thing in then process it line by line:
>
> tcon <- file(tfile, "r") # or tcon <- textConnection(t)
> allfile <- readLines(tcon, n=10000)
>
>> length(allfile)
> [1] 6
>
> On Thu, Oct 15, 2015 at 4:16 PM, William Dunlap <wdunlap at tibco.com> wrote:
>> I would like to read a connection line by line with scan but
>> don't know how to tell when to quit trying.  Is there any
>> way that you can ask the connection object if it is at the end?
>>
>> E.g.,
>>
>> t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n'
>> tfile <- tempfile()
>> cat(t, file=tfile)
>> tcon <- file(tfile, "r") # or tcon <- textConnection(t)
>> scan(tcon, what="", nlines=1)
>> #Read 2 items
>> #[1] "A"               "Two line\nentry"
>>> scan(tcon, what="", nlines=1)  # empty line
>> #Read 0 items
>> #character(0)
>> scan(tcon, what="", nlines=1)
>> #Read 3 items
>> #[1] "Three\nline\nentry" "D"                  "E"
>> scan(tcon, what="", nlines=1) # end of file
>> #Read 0 items
>> #character(0)
>> scan(tcon, what="", nlines=1) # end of file
>> #Read 0 items
>> #character(0)
>>
>> I am reading virtual line by virtual line because the lines
>> may have different numbers of fields.
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
> --
> Sarah Goslee
> http://www.functionaldiversity.org



More information about the R-help mailing list