[Rd] R 2.7.0, match() and strings containing \0 - bug?

Jon Clayden jon.clayden at gmail.com
Mon Apr 28 13:21:53 CEST 2008


2008/4/28 Prof Brian Ripley <ripley at stats.ox.ac.uk>:
>
> On Mon, 28 Apr 2008, Jon Clayden wrote:
>
>
> > Hi,
> >
> > A piece of my code that uses readBin() to read a certain file type is
> > behaving strangely with R 2.7.0. This seems to be because of a failure
> > to match() strings after using rawToChar() when the original was
> > terminated with a "\0" character. Direct equality testing with ==
> > still works as expected. I can reproduce this as follows:
> >
> >
> > > x <- "foo"
> > > y <- c(charToRaw("foo"),as.raw(0))
> > > z <- rawToChar(y)
> > > z==x
> > >
> > [1] TRUE
> >
> > > z=="foo"
> > >
> > [1] TRUE
> >
> > > z %in% c("foo","bar")
> > >
> > [1] FALSE
> >
> > > z %in% c("foo","bar","foo\0")
> > >
> > [1] FALSE
> >
> > But without the nul character it works fine:
> >
> >
> > > zz <- rawToChar(charToRaw("foo"))
> > > zz %in% c("foo","bar")
> > >
> > [1] TRUE
> >
> > I don't see anything about this in the latest NEWS, but is this
> > expected behaviour? Or is it, as I suspect, a bug? This seems to be
> > new to R 2.7.0, as I said.
> >
>
>  And so is the comment in ?match:
>
>      Character inputs with embedded nul bytes will be truncated at the
>      first nul.
>
>  The bug is in the documentation here -- this was intentional.
>
>  As support for embedded nuls in character strings is being removed in R
> 2.8.0, you should not rely on this.
>

Thanks for the reply, but I don't see why this should make the match
fail. If "foo\0" gets truncated to "foo", then surely there's no
question that match("foo\0","foo") should produce "1" (which it does
if you use the literals, but not if it came out of rawToChar)?

Also, ?'==' seems to contain a similar comment:

     When comparisons are made between character strings, parts of the
     strings after embedded 'nul' characters are ignored.

So why are the results different? I would expect 'z=="foo"' and 'z
%in% "foo"' to both return TRUE, but the second returns FALSE.

Jon



More information about the R-devel mailing list