[R] puzzle using gsub (and encodings maybe)

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Oct 14 20:19:02 CEST 2009


On Wed, 14 Oct 2009, Adrian Dragulescu wrote:

>> charToRaw(x)
> [1] 4e 45 57 20 59 4f 52 4b 20 ad 4e 45 57 20 45 4e 47 4c 41 4e 44
>> charToRaw(y)
> [1] 4e 45 57 20 59 4f 52 4b 20 2d 4e 45 57 20 45 4e 47 4c 41 4e 44
>> 
>
> So they are different.

We really do need the 'at a minimum' information we asked you for in 
the posting guide.  But in cp1252 (a guess as to what you might be 
using) \xad is a 'soft hyphen', and that is not the same thing as a 
hyphen -- you will get the same issues with 'non-breaking space'.

BDR

>
> Adrian
>
> I use R 2.8.1 on WinXP
>
>
> On Wed, 14 Oct 2009, Duncan Murdoch wrote:
>
>> On 10/14/2009 1:30 PM, Adrian Dragulescu wrote:
>>> Hello,
>>> 
>>> Below is some output that shows my issue.
>>> 
>>> I have a variable x that I read from a file (more on this below)
>>> 
>>>> x
>>> [1] "NEW YORK NEW ENGLAND"
>>>> gsub(" -", "-", x)            # this does not work!
>>> [1] "NEW YORK NEW ENGLAND"

Well, I see no hyphen at all here, but then I am not on Windows.

>> It looks as though it worked, presumably because something got lost in your 
>> email.
>> 
>> Could you post charToRaw(x) so we can see what's in x?
>> 
>> Duncan Murdoch
>> 
>>>> Encoding(x)                   # is x in a special encoding? no
>>> [1] "unknown"
>>>> y = "NEW YORK -NEW ENGLAND"   # I type in variable y
>>>> gsub(" -", "-", y)            # and gsub works as expected
>>> [1] "NEW YORK-NEW ENGLAND"
>>>> 
>>> 
>>> I'm sure the problem has to do with the way I read the variable x.  But 
>>> even if I change the encoding for x to ASCII, I still cannot do the sub.
>>> I get x by reading a pdf file with pdftotext so you will not be able to 
>>> replicate my issue.
>>> 
>>> Thanks for any suggestions,
>>> Adrian

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list