[Rd] random output with sub(fixed = TRUE)
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Dec 21 23:29:54 CET 2005
On Wed, 21 Dec 2005, Roger D. Peng wrote:
> Well, who am I to break this long-standing ritual? :)
>
> Interestingly, while the printed output looks wrong, I get
>
> > v <- paste(0:10, "asdf", sep = ".")
> > a <- sub(".asdf", "", v, fixed = TRUE)
> > b <- as.character(0:10)
> > identical(a, b)
> [1] TRUE
> >
identical is wrong! R character strings have a true length and a C-style
length: print() prints the all the characters, even those after embedded
nuls. identical uses
if(strcmp(CHAR(STRING_ELT(x, i)),
CHAR(STRING_ELT(y, i))) != 0)
which is C-style.
The issue is character.c:1015 whose nr gets trashed: note the first answer
in the vector is correct. So easy to fix.
This code has been as currently for years, so I don't think this is at all
related to the release of 2.2.1.
> Peter Dalgaard wrote:
>> "Roger D. Peng" <rpeng at jhsph.edu> writes:
>>
>>
>>> I've noticed what I think is curious behavior in using 'sub(fixed = TRUE)' and
>>> was wondering if my expectation is incorrect. Here is one example:
>>>
>>> v <- paste(0:10, "asdf", sep = ".")
>>> sub(".asdf", "", v, fixed = TRUE)
>>>
>>> The results I get are
>>>
>>>> sub(".asdf", "", v, fixed = TRUE)
>>> [1] "0" "1\0st\0\0" "2\0<af>\001\0\0" "3\0<af>\001\0\0"
>>> [5] "4\0mes\0" "5\0<ba>\001\0\0" "6\0\0\0\0\0" "7\0\0\0m\0"
>>> [9] "8\0\0\0t\0" "9\0<fe>\0\0\0" "10\0\0\0\0\0"
>>>>
>>>
>>> I expected "0" in the first entry and everything else would be unchanged. Your
>>> results may vary since every time I run 'sub()' in this way, I get a slightly
>>> different answer in entires 2 through 11.
>>>
>>> As it turns out, 'gsub(fixed = TRUE)' gives me the answer I *actually* wanted,
>>> which was to replace the string in every entry. But I still think the behavior
>>> of 'sub(fixed = TRUE) is a bit odd.
>>>
>>>> version
>>> _
>>> platform x86_64-unknown-linux-gnu
>>> arch x86_64
>>> os linux-gnu
>>> system x86_64, linux-gnu
>>> status
>>> major 2
>>> minor 2.1
>>> year 2005
>>> month 12
>>> day 20
>>> svn rev 36812
>>> language R
>>>>
>>
>>
>> Argh...
>>
>> year 2005
>> month 12
>> day 21
>>
>> and something like this gets discovered. It's a ritual, I tell ya, a ritual!
>>
>> If you look at the output and terminate all strings at the embedded
>> \0, it looks much more sensible, so it should be fairly easy to spot
>> the cause of this bug...
>>
>
> --
> Roger D. Peng | http://www.biostat.jhsph.edu/~rpeng/
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list