[Rd] random output with sub(fixed = TRUE)

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Dec 21 23:29:54 CET 2005


On Wed, 21 Dec 2005, Roger D. Peng wrote:

> Well, who am I to break this long-standing ritual? :)
>
> Interestingly, while the printed output looks wrong, I get
>
> > v <- paste(0:10, "asdf", sep = ".")
> > a <- sub(".asdf", "", v, fixed = TRUE)
> > b <- as.character(0:10)
> > identical(a, b)
> [1] TRUE
> >

identical is wrong!  R character strings have a true length and a C-style
length: print() prints the all the characters, even those after embedded 
nuls.  identical uses

 	    if(strcmp(CHAR(STRING_ELT(x, i)),
 		      CHAR(STRING_ELT(y, i))) != 0)

which is C-style.

The issue is character.c:1015 whose nr gets trashed: note the first answer 
in the vector is correct.  So easy to fix.

This code has been as currently for years, so I don't think this is at all 
related to the release of 2.2.1.

> Peter Dalgaard wrote:
>> "Roger D. Peng" <rpeng at jhsph.edu> writes:
>>
>>
>>> I've noticed what I think is curious behavior in using 'sub(fixed = TRUE)' and
>>> was wondering if my expectation is incorrect.  Here is one example:
>>>
>>> v <- paste(0:10, "asdf", sep = ".")
>>> sub(".asdf", "", v, fixed = TRUE)
>>>
>>> The results I get are
>>>
>>>> sub(".asdf", "", v, fixed = TRUE)
>>>  [1] "0"               "1\0st\0\0"       "2\0<af>\001\0\0" "3\0<af>\001\0\0"
>>>  [5] "4\0mes\0"        "5\0<ba>\001\0\0" "6\0\0\0\0\0"     "7\0\0\0m\0"
>>>  [9] "8\0\0\0t\0"      "9\0<fe>\0\0\0"   "10\0\0\0\0\0"
>>>>
>>>
>>> I expected "0" in the first entry and everything else would be unchanged.  Your
>>> results may vary since every time I run 'sub()' in this way, I get a slightly
>>> different answer in entires 2 through 11.
>>>
>>> As it turns out, 'gsub(fixed = TRUE)' gives me the answer I *actually* wanted,
>>> which was to replace the string in every entry.  But I still think the behavior
>>> of 'sub(fixed = TRUE) is a bit odd.
>>>
>>>> version
>>>          _
>>> platform x86_64-unknown-linux-gnu
>>> arch     x86_64
>>> os       linux-gnu
>>> system   x86_64, linux-gnu
>>> status
>>> major    2
>>> minor    2.1
>>> year     2005
>>> month    12
>>> day      20
>>> svn rev  36812
>>> language R
>>>>
>>
>>
>> Argh...
>>
>> year     2005
>> month    12
>> day      21
>>
>> and something like this gets discovered. It's a ritual, I tell ya, a ritual!
>>
>> If you look at the output and terminate all strings at the embedded
>> \0, it looks much more sensible, so it should be fairly easy to spot
>> the cause of this bug...
>>
>
> -- 
> Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list