[R] Can scan() detect end-of-file?
William Dunlap
wdunlap at tibco.com
Thu Oct 15 23:34:44 CEST 2015
scan(nlines=) does this post-processing, which is why I'm using it
instead of readLines.
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Thu, Oct 15, 2015 at 2:06 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
> Thus the post-processing, which I assume you'd have to do with scan() as well.
>
>> tcon <- file(tfile, "r") # or tcon <- textConnection(t)
>> allfile <- readLines(tcon, n=10000)
>
>> strsplit(paste(allfile, collapse="\n"), "\"")
> [[1]]
> [1] "A " "Two line\nentry" "\n\n"
> "Three\nline\nentry"
> [5] " D E"
>
> Or similar, depending on exactly what you want the result to look like.
>
> On Thu, Oct 15, 2015 at 4:56 PM, William Dunlap <wdunlap at tibco.com> wrote:
>> readLines() does not work for me since it breaks up
>> multiline fields that are enclosed in quotes. E.g., the
>> text file line
>> A "Two line\nentry"
>> should be imported as 2 strings, the second being
>> "Two line\nfield", not "\"Two line" with the next call to
>> readLines bringing in "fentry\"".
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>>
>> On Thu, Oct 15, 2015 at 1:44 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
>>> I've always used system("wc -l myfile") to get the number of lines in
>>> advance. But here are two other R-only options, both using readLines
>>> instead of scan. There's probably something more efficient, too.
>>>
>>> Your setup:
>>> t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n'
>>> tfile <- tempfile()
>>> cat(t, file=tfile)
>>> tcon <- file(tfile, "r") # or tcon <- textConnection(t)
>>>
>>> readLines() produces character(0) for nonexistent lines and "" for empty lines.
>>>
>>>> readLines(tcon, n=1)
>>> [1] "A \"Two line"
>>>> readLines(tcon, n=1)
>>> [1] "entry\""
>>>> readLines(tcon, n=1)
>>> [1] ""
>>>> readLines(tcon, n=1)
>>> [1] "\"Three"
>>>> readLines(tcon, n=1)
>>> [1] "line"
>>>> readLines(tcon, n=1)
>>> [1] "entry\" D E"
>>>> readLines(tcon, n=1)
>>> character(0)
>>>> readLines(tcon, n=1)
>>> character(0)
>>>
>>> Or if the file isn't too large for memory, you can read the whole
>>> thing in then process it line by line:
>>>
>>> tcon <- file(tfile, "r") # or tcon <- textConnection(t)
>>> allfile <- readLines(tcon, n=10000)
>>>
>>>> length(allfile)
>>> [1] 6
>>>
>>> On Thu, Oct 15, 2015 at 4:16 PM, William Dunlap <wdunlap at tibco.com> wrote:
>>>> I would like to read a connection line by line with scan but
>>>> don't know how to tell when to quit trying. Is there any
>>>> way that you can ask the connection object if it is at the end?
>>>>
>>>> E.g.,
>>>>
>>>> t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n'
>>>> tfile <- tempfile()
>>>> cat(t, file=tfile)
>>>> tcon <- file(tfile, "r") # or tcon <- textConnection(t)
>>>> scan(tcon, what="", nlines=1)
>>>> #Read 2 items
>>>> #[1] "A" "Two line\nentry"
>>>>> scan(tcon, what="", nlines=1) # empty line
>>>> #Read 0 items
>>>> #character(0)
>>>> scan(tcon, what="", nlines=1)
>>>> #Read 3 items
>>>> #[1] "Three\nline\nentry" "D" "E"
>>>> scan(tcon, what="", nlines=1) # end of file
>>>> #Read 0 items
>>>> #character(0)
>>>> scan(tcon, what="", nlines=1) # end of file
>>>> #Read 0 items
>>>> #character(0)
>>>>
>>>> I am reading virtual line by virtual line because the lines
>>>> may have different numbers of fields.
>>>>
>>>> Bill Dunlap
>>>> TIBCO Software
>>>> wdunlap tibco.com
>>> --
>>> Sarah Goslee
>>> http://www.functionaldiversity.org
More information about the R-help
mailing list