R-alpha: Latin-1 characters / Locale etc.
Martin Maechler
Martin Maechler <maechler@stat.math.ethz.ch>
Thu, 27 Nov 1997 10:35:06 +0100
>>>>> "PD" =3D=3D Peter Dalgaard BSA <p.dalgaard@biostat.ku.dk> writes:
PD> Ross Ihaka <ihaka@stat.auckland.ac.nz> writes:
>> >> ------------------------ >> R & R, any comments? >>
>> ------------------------
>>
>> At present the parser makes the decision on what characters can go
>> into symbol names based on isalpha(c). If someone will send me a
>> function - say isidchar(c) which returns 1 for characters which can
>> be in identifiers and 0 otherwise, I will replace the current test
>> with that.
>>
>> Ross
Hmm, so we would follow the Unix locale philosophy.
I could live with it.
It has however, a distinct drawback:
You can write R code which works with R compiled in one environment but
fails with --identical R source code-- compiled in a different environment.
While this is true for things like 'readline' and 'proc.time / system.time'=
,
I don't like it so much for such a basic things as symbol characters.
PD> Ahaaa... So the "oscillatory behaviour" is just me shifting between
PD> machines with proper locale configuration and machines without it!
PD> I think that isalpha() is actually the way to go. People just have
PD> to get their locales right. Here's what's in isalpha(c)=3D=3D1 for =
the
PD> da_DK locale:
PD> ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz=AD
PD> =C0=C1=C2=C3=C4=C5=C6=C7=C8=C9=CA=CB=CC=CD=CE=CF=D0=D1=D2=D3=D4=D5=
=D6=D8=D9=DA=DB=DC=DD=DE=DF=E0=E1=E2=E3=E4=E5=E6=E7=E8=E9=EA=EB=EC=ED=EE=EF=
=F0=F1=F2=F3=F4=F5=F6=F8=F9=FA=FB=FC=FD=FE=FF
PD> The hyphen following 'z' is actually 0xad (soft hyphen).
In any case, I'd propose a new
=09function 'alphachars()'
and/or a global variable
=09Alphachars
or=09.Symbolchars
(or something better)
which returns a vector of nchar(1)-characters
giving the available symbols.
In=09../library/base/Alpha.Rd (the accompanying help page),
all this would then be explained to users.
BTW, Peter D., do you have a (electronical form of a) document available
which nicely explains the locale stuff (for a user, not a C-programmer ..)=
.
Kurt/Fritz/???: I think there are some nice pages available in Linux.somet=
hing
----------------------------------------
I'm still wondering:
The only locale thing we have is (the environment variable)
=09LC_CTYPE=3Diso_8859_1
But then I wonder why I saw the difference between =E4 and =FC
that I reported ....
- Martin=
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._