[R] cannot base64decode string which is base64encode in R
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Aug 6 09:47:17 CEST 2013
On 06/08/2013 08:34, Qiang Wang wrote:
> Thanks for your Elaborative explanation. If I'm understanding correct. "ߟ"
> belongs to those characters that CAN be interpreted by UTF-8. Others are
> left as they are, such as, "\xe4" and "\xac". So the following code will
> show an error message, but it won't affect the use of x?
> x <- "\xe4"
>
> I have a question maybe off the topic, but it bothered me much and can't
> find the answer anywhere:
> In R, how to add a null character to a string? Even just to store one null
> character seems not possible:
> x <- "\0". The question raised from a web api which requires submitted
> strings to contain a null character.
It is not possible. Character strings in R cannot contain nuls (not
nulls, sic). Use raw vectors instead.
This is documented, so time to read some manuals ....
>
>
> On Tue, Aug 6, 2013 at 1:43 AM, Enrico Schumann <es at enricoschumann.net>wrote:
>
>> On Mon, 05 Aug 2013, Qiang Wang <unsown at gmail.com> writes:
>>
>>>> On Sat, Aug 3, 2013 at 3:49 PM, Enrico Schumann <es at enricoschumann.net
>>> wrote:
>>>>
>>>>> On Fri, 02 Aug 2013, Qiang Wang <unsown at gmail.com> writes:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm struggling with encode/decode strings in R. Don't know why the
>> second
>>>>>> example below would fail. Thanks in advance for your help.
>>>>>> succeed: s <- "saf" x <- base64encode(s) y <- base64decode(x,
>> "character")
>>>>>> fail: s <- "safs" x <- base64encode(s) y <- base64decode(x,
>> "character")
>>>>>>
>>>>>
>>>>> And the first example works for you?
>>>>>
>>>>> require("base64enc")
>>>>> s <- "saf"
>>>>> x <- base64encode(s)
>>>>>
>>>>> ## Error in file(what, "rb") : cannot open the connection
>>>>> ## In addition: Warning message:
>>>>> ## In file(what, "rb") : cannot open file 'saf': No such file or
>> directory
>>>>>
>>>>> ?base64encode says that its first argument is
>>>>>
>>>>> "data to be encoded/decoded. For ‘base64encode’ it can be a raw
>>>>> vector, text connection or file name. For ‘base64decode’ it can be
>>>>> a string or a binary connection."
>>>>>
>>>>> Try this:
>>>>>
>>>>> rawToChar(base64decode(base64encode(charToRaw("saf"))))
>>>>>
>>>>> ## [1] "saf"
>>>>>
>>>>> --
>>>>> Enrico Schumann
>>>>> Lucerne, Switzerland
>>>> http://enricoschumann.net
>>>>
>>>
>>> Thanks for your reply!
>>>
>>> Sorry I did not clarify that I was using base64encode and base64decode
>>> functions provide from "caTools" package. It seems that if I convert the
>>> string to the raw type first, it still solves my problem.
>>>
>>> My original problem actually is that I have a string:
>>> secret <-
>>>
>> '5Kwug+Byrq+ilULMz3IBD5tquNt5CcdYi3XPc8jnKwtXvIgHw/vcSGU1VCIo4b/OfcRDm7uH359syfhWzXFrNg=='
>>>
>>> It was claimed to be encoded in Base64. So I tried to decode it:
>>>
>>> require("base64enc")
>>> rawToChar(base64decode(secret))
>>>
>>> Then, I got
>>>
>> "\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\001\017\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\vW\xbc\x88\a\xc3\xfb\xdcHe5T\"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87ߟl\xc9\xf8V\xcdqk6"
>>>
>>> But what I suppose to get is:
>>>
>> '\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\x01\x0f\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\x0bW\xbc\x88\x07\xc3\xfb\xdcHe5T"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87\xdf\x9fl\xc9\xf8V\xcdqk6'
>>>
>>> Most part of the result is correct except several characters near the
>> end.
>>> I don't know where the problem is.
>>>
>>
>> See the help page of 'rawToChar': the function transforms raw bytes into
>> characters. But, depending on your locale, one character may be more
>> than one byte. On my computer, with a UTF-8 locale (see my
>> '?sessionInfo' below),
>>
>> rawToChar(base64decode(secret), TRUE)
>>
>> gives me
>>
>> ## [1] "\xe4" "\xac" "." "\x83" "\xe0" "r" "\xae"
>> ## [8] "\xaf" "\xa2" "\x95" "B" "\xcc" "\xcf" "r"
>> ## [15] "\001" "\017" "\x9b" "j" "\xb8" "\xdb" "y"
>> ## [22] "\t" "\xc7" "X" "\x8b" "u" "\xcf" "s"
>> ## [29] "\xc8" "\xe7" "+" "\v" "W" "\xbc" "\x88"
>> ## [36] "\a" "\xc3" "\xfb" "\xdc" "H" "e" "5"
>> ## [43] "T" "\"" "(" "\xe1" "\xbf" "\xce" "}"
>> ## [50] "\xc4" "C" "\x9b" "\xbb" "\x87" "\xdf" "\x9f"
>> ## [57] "l" "\xc9" "\xf8" "V" "\xcd" "q" "k"
>> ## [64] "6"
>>
>> That is, every *single* byte is converted into character. For example:
>>
>> rawToChar(base64decode(secret), TRUE)[55:56]
>>
>> gives
>>
>> ## [1] "\xdf" "\x9f"
>>
>> which probably is what you expected. But if I paste those two
>> characters together,
>>
>> paste(rawToChar(base64decode(s), TRUE)[55:56], collapse = "")
>>
>> they will be shown like so:
>>
>> ## [1] "ߟ"
>>
>> because this is how this byte pattern will be interpreted in UTF-8.
>>
>>
>>
>>
>> Abbreviated 'sessionInfo':
>>
>> R version 3.0.1 (2013-05-16)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_GB.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_GB.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>>
>>
>> --
>> Enrico Schumann
>> Lucerne, Switzerland
>> http://enricoschumann.net
>>
>
> [[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list