[R] Japanese in R

Sun May 28 11:34:13 CEST 2000

Dear Paul, 

paul> 
paul> I think it should be possible for you to press a Japanese key on your
paul> keyboard and have the appropriate Japanese character drawn by the Hershey
paul> font.  However, at least one necessary condition will be that your
paul> keypresses are encoded by the computer using the same encoding that the
paul> Hershey fonts use, which is JIS X0208 standard.  Do you know how your
paul> keypresses are encoded ?
paul> 
paul> Just out of interest, I presume that the Japanese keys represent Kana rather
paul> than Kanji.  Is that right and if so, how do you normally type Kanji
paul> characters on a computer ?
paul> 
paul> paul

Thank you for your kind information on the Hershey fonts facility of R.
It may be boring for a majority of R users to know details on the Japanese 
character system, but let me explain a little. In order to be widely used 
in non-English countries and among non-statisticians, R cannot avoid such 
subtle problems. 

I am afraid not to be able to reply your questions with confidence, because 
I am fairly ignorant of details of internal computer mechanism of handling 
Japanese fonts. I am using the Debian GNU/Linux (potato) on a PC which is 
partially Japanized from the first by volunteer experts. A font utility tool 
taught me that my default Japanese encoding system is the JIS x0212. Also I 
found my PC already has the JIS x0208 fonts which Paul said to be equivalent 
to the Hershey fonts. Apparently, their appearances are similar to those in 
the display output of example(Japanese). They looks fairly good even on a PC 
display (sorry, I could not still check their printouts). The main difference 
between JIS x0212 and JIS x0208 fonts is the width of component lines of 
fonts. The JIS x0208 fonts consist of the same thin lines (as the Hershey 
fonts should be) and JIS x0208 fonts have variable width lines. Originally 
Japanese fonts were drawn by a kind of brush and, hence, are designed to 
have variable-widths naturally. 

As Paul seems to know well, there are several font systems used in parallel
in Japan. Kanji (meaning Chinese characters) fonts are ideogram and originally 
were Chinese fonts (fairly a large part is still in common even now). 
Therefore there are extraordinary many kinds. Hershey fonts have 603 kanji 
fonts. This number is probably sufficient for most Japanese to live comfortably. 
But our Ministry of Education has the opinion that every Japanese school children 
should learn to read and write 881 Kanjis and that every decent Japanese citizens 
should read (not necessarily should be able to write) at least 1850 Kanjis. 
JIS (Japanese Industrial Standards) presently requires for Japanese PC to be 
able to use about 3000 Kanjis (will be extended finally to about 4,000 kinds in 
future). Absurd, you may think, but we are lucky enough if compared with Chinese 
who should conquer much larger kinds of Chinese characters. (It is said that 
there are about 50,000 kinds of Chinese characters including historical ones which 
were at least once used.) You may think that the so-called "unicode" system will
solve these worldwide language problems. But we are skeptic about such scope. 
Recently, a Japanese software company began to sell a new OS for PC (a descendent 
of TRON, an OS of Japanese origin which is secretly and widely used as OS of japanese 
domestic electric machines) which is proud of its builtin facility of handling about 
120,000 world fonts (including Egyptian hieroglyphs :-) from the first as OS level. 
This would be certainly a final solution in a sense.

The situation is even more complex. In order to handle Japanese in computers, there 
are several (a least 4 kinds) encoding systems used simultaneously. Most PC (e.g., 
with MS-Windows) use the Shift-JIS encoding, while most Unix machine (including Linux 
PC) uses the JIS encoding (and internally using the EUC encoding). In old times, 
mainframe machines such as IBM's used another encoding systems. Therefore, even 
Japanese frequently finds impossible to read files created by others.

Other three fonts are phonograms. Hirakana and Katakana fonts (about 50 kinds each)  
were originally abbreviated styles of certain Kanji fonts and they have one-to-one 
correspondence and the same pronunciations but quite different appearances. Why there 
are two, you may ask. The reason is simple :-). In old times (about 1,000 years ago), 
Katakana was used solely by men and Hirakana was used solely by women. Also after 
learning western culture, Japanese invented a system to denote Japanese using Roman 
alphabets (Romaji).

OK, I should hurry to reply Paul's question. Japanese PC keyboards (jp106 keyboards) 
have keys each of which have both a Roman alphabet and a Hirakana on the key tops.
Therefore we can input both Roman alphabets and Hirakanas directly. But, what are
actually inputted depends on each softwares (and, in particular, on their customizations).   
In order to convert alphabet or Hirakana inputs to actual Japanese characters, 
we need to use extra resident softwares (FEP, Front End Processor, we call) what 
translate (actually they make clever, sometime stupid, guessing) which Japanese fonts 
users atually want to need. In order to use these FEP, corresponding softwares should 
have builtin facilities to communicate with FEP internally (and, of course, to display 
or printout them). What is most difficult for Japanese users to use foreign softwares 
(free or commercial) is to implement such facilities into softwares. I am not completely 
certain, but this difficulty seems becoming less important due to the use of vector fonts 
(as GNU plotutil just does). Soryy, I have not yet test the MS Windows version of R.  
I myself am used to input alphabets (as Romaji) and then convert them wordwise (frequently 
by trial-and-error operations) using a FEP into what I actually need.

(It may be interesting for you to know that these inevitable complexities happened 
to become once a natural iron fort for Japanese computer industries to guard their 
markets from American giant computer companies such as IBM. But, now, it is rather a 
curse to us.)   

Therefore, my reply to Paul's question is again that (probably) the only method to use 
the Hershey fonts from R is, at least at present, just to write escape sequence codes 
as in example(Japanese). This may be true even if I change my basic encoding system to 
JIS X0208. As I said before, this seems of minor inconvenience. No Japanese want to write 
R codes themselves by Japanese. And I can have a happy expect that we could use Japanese
in R more flexibly in future if some kind Japanese experts would japanize GNU plotutil 
program.

By the way, Japanese PC keyboards usually lack the backslash key. It is replaced by the 
Japanese "yen" symbol. But, this is not a problem. It normally works just as if it is the 
backslash symbol, although we are forced to see yen symbols over all around display :-)

PS. Paul, I have just translate "Hershey.html" into Japanese and noted that 
there are several "<CODE></CODE>" in it. I guess this should be "<CODE>/</CODE>", true?

================================================================
Shigeru Mase <mase at is.titech.ac.jp>
Dept. of Math. and Comp. Sciences, Tokyo Institute od Technology.
Oh-Okayama, 2-12-1, Meguro-ku, Tokyo, 152-8552, Japan
=================================================================
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._