[R] cannot base64decode string which is base64encode in R
Enrico Schumann
es at enricoschumann.net
Mon Aug 5 19:43:43 CEST 2013
On Mon, 05 Aug 2013, Qiang Wang <unsown at gmail.com> writes:
>> On Sat, Aug 3, 2013 at 3:49 PM, Enrico Schumann <es at enricoschumann.net>wrote:
>>
>>> On Fri, 02 Aug 2013, Qiang Wang <unsown at gmail.com> writes:
>>>
>>> > Hi,
>>> >
>>> > I'm struggling with encode/decode strings in R. Don't know why the second
>>> > example below would fail. Thanks in advance for your help.
>>> > succeed: s <- "saf" x <- base64encode(s) y <- base64decode(x, "character")
>>> > fail: s <- "safs" x <- base64encode(s) y <- base64decode(x, "character")
>>> >
>>>
>>> And the first example works for you?
>>>
>>> require("base64enc")
>>> s <- "saf"
>>> x <- base64encode(s)
>>>
>>> ## Error in file(what, "rb") : cannot open the connection
>>> ## In addition: Warning message:
>>> ## In file(what, "rb") : cannot open file 'saf': No such file or directory
>>>
>>> ?base64encode says that its first argument is
>>>
>>> "data to be encoded/decoded. For ‘base64encode’ it can be a raw
>>> vector, text connection or file name. For ‘base64decode’ it can be
>>> a string or a binary connection."
>>>
>>> Try this:
>>>
>>> rawToChar(base64decode(base64encode(charToRaw("saf"))))
>>>
>>> ## [1] "saf"
>>>
>>> --
>>> Enrico Schumann
>>> Lucerne, Switzerland
>> http://enricoschumann.net
>>
>
> Thanks for your reply!
>
> Sorry I did not clarify that I was using base64encode and base64decode
> functions provide from "caTools" package. It seems that if I convert the
> string to the raw type first, it still solves my problem.
>
> My original problem actually is that I have a string:
> secret <-
> '5Kwug+Byrq+ilULMz3IBD5tquNt5CcdYi3XPc8jnKwtXvIgHw/vcSGU1VCIo4b/OfcRDm7uH359syfhWzXFrNg=='
>
> It was claimed to be encoded in Base64. So I tried to decode it:
>
> require("base64enc")
> rawToChar(base64decode(secret))
>
> Then, I got
> "\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\001\017\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\vW\xbc\x88\a\xc3\xfb\xdcHe5T\"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87ߟl\xc9\xf8V\xcdqk6"
>
> But what I suppose to get is:
> '\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\x01\x0f\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\x0bW\xbc\x88\x07\xc3\xfb\xdcHe5T"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87\xdf\x9fl\xc9\xf8V\xcdqk6'
>
> Most part of the result is correct except several characters near the end.
> I don't know where the problem is.
>
See the help page of 'rawToChar': the function transforms raw bytes into
characters. But, depending on your locale, one character may be more
than one byte. On my computer, with a UTF-8 locale (see my
'?sessionInfo' below),
rawToChar(base64decode(secret), TRUE)
gives me
## [1] "\xe4" "\xac" "." "\x83" "\xe0" "r" "\xae"
## [8] "\xaf" "\xa2" "\x95" "B" "\xcc" "\xcf" "r"
## [15] "\001" "\017" "\x9b" "j" "\xb8" "\xdb" "y"
## [22] "\t" "\xc7" "X" "\x8b" "u" "\xcf" "s"
## [29] "\xc8" "\xe7" "+" "\v" "W" "\xbc" "\x88"
## [36] "\a" "\xc3" "\xfb" "\xdc" "H" "e" "5"
## [43] "T" "\"" "(" "\xe1" "\xbf" "\xce" "}"
## [50] "\xc4" "C" "\x9b" "\xbb" "\x87" "\xdf" "\x9f"
## [57] "l" "\xc9" "\xf8" "V" "\xcd" "q" "k"
## [64] "6"
That is, every *single* byte is converted into character. For example:
rawToChar(base64decode(secret), TRUE)[55:56]
gives
## [1] "\xdf" "\x9f"
which probably is what you expected. But if I paste those two
characters together,
paste(rawToChar(base64decode(s), TRUE)[55:56], collapse = "")
they will be shown like so:
## [1] "ߟ"
because this is how this byte pattern will be interpreted in UTF-8.
Abbreviated 'sessionInfo':
R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
--
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net
More information about the R-help
mailing list