[R] Japanese in R
Shigeru Mase
mase at is.titech.ac.jp
Sun May 28 11:34:13 CEST 2000
Dear Paul,
paul>
paul> I think it should be possible for you to press a Japanese key on your
paul> keyboard and have the appropriate Japanese character drawn by the Hershey
paul> font. However, at least one necessary condition will be that your
paul> keypresses are encoded by the computer using the same encoding that the
paul> Hershey fonts use, which is JIS X0208 standard. Do you know how your
paul> keypresses are encoded ?
paul>
paul> Just out of interest, I presume that the Japanese keys represent Kana rather
paul> than Kanji. Is that right and if so, how do you normally type Kanji
paul> characters on a computer ?
paul>
paul> paul
Thank you for your kind information on the Hershey fonts facility of R.
It may be boring for a majority of R users to know details on the Japanese
character system, but let me explain a little. In order to be widely used
in non-English countries and among non-statisticians, R cannot avoid such
subtle problems.
I am afraid not to be able to reply your questions with confidence, because
I am fairly ignorant of details of internal computer mechanism of handling
Japanese fonts. I am using the Debian GNU/Linux (potato) on a PC which is
partially Japanized from the first by volunteer experts. A font utility tool
taught me that my default Japanese encoding system is the JIS x0212. Also I
found my PC already has the JIS x0208 fonts which Paul said to be equivalent
to the Hershey fonts. Apparently, their appearances are similar to those in
the display output of example(Japanese). They looks fairly good even on a PC
display (sorry, I could not still check their printouts). The main difference
between JIS x0212 and JIS x0208 fonts is the width of component lines of
fonts. The JIS x0208 fonts consist of the same thin lines (as the Hershey
fonts should be) and JIS x0208 fonts have variable width lines. Originally
Japanese fonts were drawn by a kind of brush and, hence, are designed to
have variable-widths naturally.
As Paul seems to know well, there are several font systems used in parallel
in Japan. Kanji (meaning Chinese characters) fonts are ideogram and originally
were Chinese fonts (fairly a large part is still in common even now).
Therefore there are extraordinary many kinds. Hershey fonts have 603 kanji
fonts. This number is probably sufficient for most Japanese to live comfortably.
But our Ministry of Education has the opinion that every Japanese school children
should learn to read and write 881 Kanjis and that every decent Japanese citizens
should read (not necessarily should be able to write) at least 1850 Kanjis.
JIS (Japanese Industrial Standards) presently requires for Japanese PC to be
able to use about 3000 Kanjis (will be extended finally to about 4,000 kinds in
future). Absurd, you may think, but we are lucky enough if compared with Chinese
who should conquer much larger kinds of Chinese characters. (It is said that
there are about 50,000 kinds of Chinese characters including historical ones which
were at least once used.) You may think that the so-called "unicode" system will
solve these worldwide language problems. But we are skeptic about such scope.
Recently, a Japanese software company began to sell a new OS for PC (a descendent
of TRON, an OS of Japanese origin which is secretly and widely used as OS of japanese
domestic electric machines) which is proud of its builtin facility of handling about
120,000 world fonts (including Egyptian hieroglyphs :-) from the first as OS level.
This would be certainly a final solution in a sense.
The situation is even more complex. In order to handle Japanese in computers, there
are several (a least 4 kinds) encoding systems used simultaneously. Most PC (e.g.,
with MS-Windows) use the Shift-JIS encoding, while most Unix machine (including Linux
PC) uses the JIS encoding (and internally using the EUC encoding). In old times,
mainframe machines such as IBM's used another encoding systems. Therefore, even
Japanese frequently finds impossible to read files created by others.
Other three fonts are phonograms. Hirakana and Katakana fonts (about 50 kinds each)
were originally abbreviated styles of certain Kanji fonts and they have one-to-one
correspondence and the same pronunciations but quite different appearances. Why there
are two, you may ask. The reason is simple :-). In old times (about 1,000 years ago),
Katakana was used solely by men and Hirakana was used solely by women. Also after
learning western culture, Japanese invented a system to denote Japanese using Roman
alphabets (Romaji).
OK, I should hurry to reply Paul's question. Japanese PC keyboards (jp106 keyboards)
have keys each of which have both a Roman alphabet and a Hirakana on the key tops.
Therefore we can input both Roman alphabets and Hirakanas directly. But, what are
actually inputted depends on each softwares (and, in particular, on their customizations).
In order to convert alphabet or Hirakana inputs to actual Japanese characters,
we need to use extra resident softwares (FEP, Front End Processor, we call) what
translate (actually they make clever, sometime stupid, guessing) which Japanese fonts
users atually want to need. In order to use these FEP, corresponding softwares should
have builtin facilities to communicate with FEP internally (and, of course, to display
or printout them). What is most difficult for Japanese users to use foreign softwares
(free or commercial) is to implement such facilities into softwares. I am not completely
certain, but this difficulty seems becoming less important due to the use of vector fonts
(as GNU plotutil just does). Soryy, I have not yet test the MS Windows version of R.
I myself am used to input alphabets (as Romaji) and then convert them wordwise (frequently
by trial-and-error operations) using a FEP into what I actually need.
(It may be interesting for you to know that these inevitable complexities happened
to become once a natural iron fort for Japanese computer industries to guard their
markets from American giant computer companies such as IBM. But, now, it is rather a
curse to us.)
Therefore, my reply to Paul's question is again that (probably) the only method to use
the Hershey fonts from R is, at least at present, just to write escape sequence codes
as in example(Japanese). This may be true even if I change my basic encoding system to
JIS X0208. As I said before, this seems of minor inconvenience. No Japanese want to write
R codes themselves by Japanese. And I can have a happy expect that we could use Japanese
in R more flexibly in future if some kind Japanese experts would japanize GNU plotutil
program.
By the way, Japanese PC keyboards usually lack the backslash key. It is replaced by the
Japanese "yen" symbol. But, this is not a problem. It normally works just as if it is the
backslash symbol, although we are forced to see yen symbols over all around display :-)
PS. Paul, I have just translate "Hershey.html" into Japanese and noted that
there are several "<CODE></CODE>" in it. I guess this should be "<CODE>/</CODE>", true?
================================================================
Shigeru Mase <mase at is.titech.ac.jp>
Dept. of Math. and Comp. Sciences, Tokyo Institute od Technology.
Oh-Okayama, 2-12-1, Meguro-ku, Tokyo, 152-8552, Japan
=================================================================
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list