[R] readLines without skipNul=TRUE causes crash

Duncan Murdoch murdoch.duncan at gmail.com
Sun Jul 16 12:34:41 CEST 2017

On 16/07/2017 6:17 AM, Anthony Damico wrote:
> thank you for taking the time to write this.  i set it running last
> night and it's still going -- if it doesn't finish by tomorrow, i will
> try to find a site to host the problem file and add that link to the bug
> report so the archive package can be avoided at least.  i'm sorry for
> the bother

How big is that text file?  I wouldn't expect my script to take more 
than a few minutes even on a huge file.

My script might have a bug...

Duncan Murdoch

> On Sat, Jul 15, 2017 at 4:14 PM, Duncan Murdoch
> <murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>> wrote:
>     On 15/07/2017 11:33 AM, Anthony Damico wrote:
>         hi, i realized that the segfault happens on the text file in a new R
>         session.  so, creating the segfault-generating text file requires a
>         contributed package, but prompting the actual segfault does not --
>         pretty sure that means this is a base R bug?  submitted here:
>         https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311
>         <https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311>
>         hopefully i
>         am not doing something remarkably stupid.  the text file itself
>         is 4GB
>         so cannot upload it to bugzilla, and from the
>         R_AllocStringBugger error
>         in the previous message, i think most or all of it needs to be
>         there to
>         trigger the segfault.  thanks!
>     I don't want to download the big file or install the archive
>     package. Could you run the code below on the bad file?  If you're
>     right and it's only nulls that matter, this might allow me to create
>     a file that triggers the bug.
>     f <-  # put the filename of the bad file here
>     con <- file(f, open="rb")
>     zeros <- numeric()
>     repeat {
>       bytes <- readBin(con, "int", 1000000, size=1)
>       zeros <- c(zeros, count + which(bytes == 0))
>       count <- count + length(bytes)
>       if (length(bytes) < 1000000) break
>     }
>     close(con)
>     cat("File length=", count, "\n")
>     cat("Nulls:\n")
>     zeros
>     Here's some code to recreate a file of the same length with nulls in
>     the same places, and spaces everywhere else:
>     size <- count
>     f2 <- tempfile()
>     con <- file(f2, open="wb")
>     count <- 0
>     while (count < size) {
>       nonzeros <- min(c(size - count, 1000000, zeros - 1))
>       if (nonzeros) {
>         writeBin(rep(32L, nonzeros), con, size = 1)
>         count <- count + nonzeros
>       }
>       zeros <- zeros - nonzeros
>       if (length(zeros) && min(zeros) == 1) {
>         writeBin(0L, con, size = 1)
>         count <- count + 1
>         zeros <- zeros[-1] - 1
>       }
>     }
>     close(con)
>     Duncan Murdoch

