R-alpha: Latin-1 characters / Locale etc.

Martin Maechler Martin Maechler <maechler@stat.math.ethz.ch>
Thu, 27 Nov 1997 10:35:06 +0100


>>>>> "PD" =3D=3D Peter Dalgaard BSA <p.dalgaard@biostat.ku.dk> writes:

    PD> Ross Ihaka <ihaka@stat.auckland.ac.nz> writes:
    >>  >> ------------------------ >> R & R, any comments?  >>
    >> ------------------------
    >> 
    >> At present the parser makes the decision on what characters can go
    >> into symbol names based on isalpha(c).  If someone will send me a
    >> function - say isidchar(c) which returns 1 for characters which can
    >> be in identifiers and 0 otherwise, I will replace the current test
    >> with that.
    >> 
    >> Ross

Hmm, so we would follow the Unix locale philosophy.
I could live with it.

It has however, a distinct drawback:

You can write R code which works with R compiled in one environment but
fails with --identical R source code-- compiled in a different environment.

While this is true for things like 'readline' and 'proc.time / system.time'=
,
I don't like it so much for such a basic things as symbol characters.


    PD> Ahaaa... So the "oscillatory behaviour" is just me shifting between
    PD> machines with proper locale configuration and machines without it!
    PD> I think that isalpha() is actually the way to go. People just have
    PD> to get their locales right. Here's what's in isalpha(c)=3D=3D1 for =
the
    PD> da_DK locale:

    PD> ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz=AD
    PD> =C0=C1=C2=C3=C4=C5=C6=C7=C8=C9=CA=CB=CC=CD=CE=CF=D0=D1=D2=D3=D4=D5=
=D6=D8=D9=DA=DB=DC=DD=DE=DF=E0=E1=E2=E3=E4=E5=E6=E7=E8=E9=EA=EB=EC=ED=EE=EF=
=F0=F1=F2=F3=F4=F5=F6=F8=F9=FA=FB=FC=FD=FE=FF

    PD> The hyphen following 'z' is actually 0xad (soft hyphen).


In any case, I'd propose a new  
=09function 'alphachars()' 
and/or a global variable 
=09Alphachars
or=09.Symbolchars
(or something better)
which returns a vector of nchar(1)-characters
giving the available symbols.

In=09../library/base/Alpha.Rd  (the accompanying help page),
all this would then be explained to users.

BTW, Peter D., do you have a (electronical form of a) document available
which nicely explains the  locale stuff (for a user, not a C-programmer ..)=
.
 Kurt/Fritz/???: I think there are some nice pages available in Linux.somet=
hing 

----------------------------------------
I'm still wondering:
The  only locale thing we have is (the environment variable)

=09LC_CTYPE=3Diso_8859_1

But then I wonder why I saw the difference between  =E4 and =FC 
that I reported ....

- Martin=
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._