[R] readLines() and unz() and non-empty final line

Iris Simmons |kw@|mmo @end|ng |rom gm@||@com
Fri Oct 25 15:37:36 CEST 2024


Hi again,


The unz connection is non-blocking by default. I checked do_unz which calls
R_newunz which calls init_con and the only place in any of those functions
that sets 'blocking' is init_con which sets it to FALSE:

https://github.com/wch/r-source/blob/0c26529e807a9b1dd65f7324958c17bf72e1de1a/src/main/connections.c#L713

I'll open an issue on R-bugzilla and see if they're willing to do something
similar to 'file()'; that is, add a 'blocking' argument to unz. It's hard
to say whether they would choose 'blocking = FALSE' for back compatibility
or 'blocking = TRUE' for consistency with 'file()'.


Regards,

Iris

On Fri, Oct 25, 2024, 04:47 Marttila Mikko <mikko.marttila using orionpharma.com>
wrote:

> Thanks Iris, Bert, and Tim.
>
>
>
> Whether unz() is blocking or not by default doesn’t seem to be documented.
> Indeed, thank you Iris for finding out that explicitly opening it as
> blocking would work. That made me wonder if it’s non-blocking by default
> then, which would have been surprising. However, explicitly opening it as
> non-blocking seems to lead to problems as well:
>
>
>
> > local({
>
> +   con <- unz("hello.zip", "hello.txt")
>
> +   open(con, blocking = FALSE)
>
> +   on.exit(close(con))
>
> +   res <- readLines(con)
>
> +   res
>
> + })
>
> Error in readLines(con) : seek not enabled for this connection
>
> Calls: local ... eval.parent -> eval -> eval -> eval -> eval -> readLines
>
> Execution halted
>
> So, the behaviour of unz() seems to be different depending on whether it
> was explicitly opened before passed to readLines(). Should this be fixed or
> documented?
>
>
>
> Best,
>
>
>
> Mikko
>
>
>
> *From:* Bert Gunter <bgunter.4567 using gmail.com>
> *Sent:* Thursday, 24 October 2024 18:13
> *To:* Iris Simmons <ikwsimmo using gmail.com>
> *Cc:* Marttila Mikko <mikko.marttila using orionpharma.com>;
> r-help using r-project.org
> *Subject:* Re: [R] readLines() and unz() and non-empty final line
>
>
>
> You don't often get email from bgunter.4567 using gmail.com. Learn why this is
> important <https://aka.ms/LearnAboutSenderIdentification>
>
> But note:
>
>
>
> > zip("hello.zip", "hello.txt")
> updating: hello.txt (stored 0%)
> > readChar(unz("hello.zip","hello.txt"),100)
> [1] "hello"
>
>
>
> I leave it to you and other wiser heads to figure out.
>
>
>
> Cheers,
>
> Bert
>
>
>
> On Thu, Oct 24, 2024 at 8:57 AM Iris Simmons <ikwsimmo using gmail.com> wrote:
>
> Hi Mikko,
>
>
> I tried running a few different things, and it seems as though
> explicitly using `open()` and opening a blocking connection works.
>
> ```R
> cat("hello", file = "hello.txt")
> zip("hello.zip", "hello.txt")
> local({
>     conn <- unz("hello.zip", "hello.txt")
>     on.exit(close(conn))
>     ## you can use "r" instead of "rt"
>     ##
>     ## 'blocking = TRUE' is the default, so remove if desired
>     open(conn, "rb", blocking = TRUE)
>     readLines(conn)
> })
> ```
>
> A blocking connection might be undesirable for you, in which case
> someone else might have a better solution.
>
> On Thu, Oct 24, 2024 at 10:58 AM Marttila Mikko via R-help
> <r-help using r-project.org> wrote:
> >
> > Dear list,
> >
> > I'm seeing a strange interaction with readLines() and unz() when reading
> > a file without an empty final line. The final line gets dropped silently:
> >
> > > cat("hello", file = "hello.txt")
> > > zip("hello.zip", "hello.txt")
> >   adding: hello.txt (stored 0%)
> > > readLines(unz("hello.zip", "hello.txt"))
> > character(0)
> >
> > The documentation for readLines() says if the final line is incomplete
> for
> > "non-blocking text-mode connections" the line is "pushed back, silently"
> > but otherwise "accepted with a warning".
> >
> > My understanding is that the unz() here is blocking so the line should be
> > accepted. Is that incorrect? If so, how would I go about reading such
> > lines from a zip file?
> >
> > Best,
> >
> > Mikko
> >
> >
> > This e-mail transmission may contain confidential or legally privileged
> information that is intended only for the individual or entity named in the
> e-mail address. If you are not the intended recipient, you are hereby
> notified that any disclosure, copying, distribution, or reliance upon the
> contents of this e-mail is strictly prohibited. If you have received this
> e-mail transmission in error, please reply to the sender, so that they can
> arrange for proper delivery, and then please delete the message from your
> computer systems. Thank you.
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> https://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> This e-mail transmission may contain confidential or legally privileged
> information that is intended only for the individual or entity named in the
> e-mail address. If you are not the intended recipient, you are hereby
> notified that any disclosure, copying, distribution, or reliance upon the
> contents of this e-mail is strictly prohibited. If you have received this
> e-mail transmission in error, please reply to the sender, so that they can
> arrange for proper delivery, and then please delete the message from your
> computer systems. Thank you.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list