[R] Using readLines on a file without resetting internal file offset parameter?
William Dunlap
wdunlap at tibco.com
Wed Oct 29 18:51:37 CET 2014
I agree that file's use of its 'mode' argument can be confusing. That is
one reason I didn't use it my example and made an explicit call to open()
after calling file() without the mode argument.
(Having to distinguish between 'binary' and 'text' mode on Windows,
'rb' and 'rt'
or 'wb' and 'wt', can add to the confusion.)
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Wed, Oct 29, 2014 at 10:12 AM, Thomas Nyberg <tomnyberg at gmail.com> wrote:
> Yeah of course you should close the file when done. I didn't give a complete
> code snippet.
>
> In any case, a quick glance at the documentation seems to imply that opening
> a file as file('filename') will defer the choice of mode (i.e. is it 'r',
> 'w', etc.?) until it is first used. In my case the first use is a read, so
> it should presumably be set to "r". However as shown in my examples this
> does work differently than opening it as file('filename','r') in the first
> place.
>
> I do agree that it is reasonable that the default behavior of
> readLines/writeLines might be to reset the file offset each time, but I
> certainly do not agree that that should be happening dependent upon whether
> the original file is opened without 'r' and then read from versus being
> opened with 'r' in the first place. That kind of a side-effect really makes
> no sense to me and is entirely unintuitive. If the goal was to have the
> default behavior to reset the file offset, a reasonable thing would be to
> have a flag in readLines like reset_fileoffset = TRUE or something like
> that.
>
> In any case, thanks so much for the help!
>
> Cheers,
> Thomas
>
>
> On 10/29/2014 12:59 PM, William Dunlap wrote:
>>
>> I meant you should close the file when you are done with it, not after
>> every few lines.
>> File descriptors are a limited resource.
>>
>> As for the rationale for the default behavior, there is a common use
>> pattern of reading
>> and parsing an entire file (or url, etc.), examining the results, and
>> trying
>> again with a different parsing scheme. In that case the default
>> behavior works well.
>>
>> In any case, I assume the behavior is documented in help("file").
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>>
>> On Wed, Oct 29, 2014 at 9:51 AM, Thomas Nyberg <tomnyberg at gmail.com>
>> wrote:
>>>
>>> Thanks for the response! I'd rather keep the file open than close it,
>>> since
>>> it would flush the internal buffer. The whole reason I'm doing this is to
>>> take advantage of the buffering and closing it would defeat the purpose.
>>>
>>> I actually just found a solution which is to open the files with the "r"
>>> flag explicitly. I.e. the following is what I want.
>>>
>>> -----
>>>
>>> bash $ echo 1 > testfile
>>> bash $ echo 2 >> testfile
>>> bash $ cat testfile
>>> 1
>>> 2
>>>
>>> bash $ R
>>> R > f <- file('testfile', 'r')
>>> R > readLines(f, n = 1)
>>> [1] "1"
>>> R > readLines(f, n = 1)
>>> [1] "2"
>>> R > readLines(f, n = 1)
>>> character(0)
>>>
>>> -----
>>>
>>> If you want to use writeLines in this same fashion you'll also need to
>>> open
>>> the original file with the "w" as well.
>>>
>>> It's very odd that file('filename') will let you read from it, but will
>>> not
>>> act the same as file('filename', 'r') when it comes to readLines. Is this
>>> a
>>> bug or is there some reasoning behind this? Regardless, it's certainly
>>> extremely unintuitive.
>>>
>>> Thanks again for the response!
>>>
>>> Cheers,
>>> Thomas
>>>
>>>
>>> On 10/29/2014 12:22 PM, William Dunlap wrote:
>>>>
>>>>
>>>> Open your file object before calling readLines and close it when you
>>>> are done with
>>>> a sequence of calls to readLines.
>>>>
>>>> > tf <- tempfile()
>>>> > cat(sep="\n", letters[1:10], file=tf)
>>>> > f <- file(tf)
>>>> > open(f)
>>>> > # or f <- file(tf, "r") instead of previous 2 lines
>>>> > readLines(f, n=1)
>>>> [1] "a"
>>>> > readLines(f, n=1)
>>>> [1] "b"
>>>> > readLines(f, n=2)
>>>> [1] "c" "d"
>>>> > close(f)
>>>>
>>>> I/O operations on an unopened connection generally open it, do the
>>>> operation,
>>>> then close it.
>>>>
>>>> Bill Dunlap
>>>> TIBCO Software
>>>> wdunlap tibco.com
>>>>
>>>>
>>>> On Wed, Oct 29, 2014 at 8:23 AM, Thomas Nyberg <tomnyberg at gmail.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> Hi everyone,
>>>>>
>>>>> I would like to read a file line by line, but I would rather not load
>>>>> all
>>>>> lines into memory first. I've tried using readLines with n = 1, but
>>>>> that
>>>>> seems to reset the internal file descriptor's file offset after each
>>>>> call.
>>>>> I.e. this is the current behavior:
>>>>>
>>>>> -------
>>>>>
>>>>> bash $ echo 1 > testfile
>>>>> bash $ echo 2 >> testfile
>>>>> bash $ cat testfile
>>>>> 1
>>>>> 2
>>>>>
>>>>> bash > R
>>>>> R > f <- file('testfile')
>>>>> R > readLines(f, n = 1)
>>>>> [1] "1"
>>>>> R > readLines(f, n = 1)
>>>>> [1] "1"
>>>>>
>>>>> -------
>>>>>
>>>>> I would like the behavior to be:
>>>>>
>>>>> -------
>>>>>
>>>>> bash > R
>>>>> R > f <- file('testfile')
>>>>> R > readLines(f, n = 1)
>>>>> [1] "1"
>>>>> R > readLines(f, n = 1)
>>>>> [1] "2"
>>>>>
>>>>> -------
>>>>>
>>>>> I'm coming to R from a python background, where the default behavior is
>>>>> exactly the opposite. I.e. when you read a line from a file it is your
>>>>> responsibility to use seek explicitly to get back to the original
>>>>> position
>>>>> in the file (this is rarely necessary though). Is there some flag to
>>>>> turn
>>>>> off the default behavior of resetting the file offset in R?
>>>>>
>>>>> Cheers,
>>>>> Thomas
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list