[R] Converting two byte encoding to UTF-8
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Sat Mar 19 11:52:22 CET 2022
I have a file that includes Japanese characters encoded using the
"JIS_X0208-1997" encoding. According to iconvlist(), an earlier
revision "JIS_X0208-1990" is supported, so I'd like to try that to
decode them.
However, I can't seem to find how to provide input to iconv() to do it.
This is a two-byte encoding, so one character has bytes
> as.raw(result[[1]]$kanji)
[1] b0 a1
But this is being interpreted as two characters by iconv():
> iconv(as.raw(result[[1]]$kanji), from = "JIS_X0208-1990", to = "UTF-8")
[1] "皸" "甕"
I can't seem to find any input that iconv() will accept to treat this as
a single character. (I believe the answer should be 亜 , if that helps.)
How do I tell it to use 0xb0a1 (or 0xa1b0, if that's the right byte
order)? I just see NA:
> iconv(0xb0a1, from = "JIS_X0208-1990", to = "UTF-8")
[1] NA
> iconv(0xa1b0, from = "JIS_X0208-1990", to = "UTF-8")
[1] NA
Duncan Murdoch
More information about the R-help
mailing list