[Rd] type.convert (PR#13646)

William Dunlap wdunlap at tibco.com
Sat Apr 11 01:36:11 CEST 2009


> From: r-devel-bounces at r-project.org 
> [mailto:r-devel-bounces at r-project.org] On Behalf Of wdunlap at tibco.com
> Sent: Friday, April 10, 2009 4:00 PM
> To: r-devel at stat.math.ethz.ch
> Cc: R-bugs at r-project.org
> Subject: Re: [Rd] type.convert (PR#13646)
> 
> Using the (unsigned int)(unsigned char) in isspace()
> resolved the problem in my Windows build.  

(int)(unsigned char) the proper thing, since isspace
is declared to be int isspace(int).

The (unsigned int)(unsigned char) will work because
C does the unsigned int -> int conversion automatically
when the prototype is present and that conversion doesn't
change the value of the thing.

> I put some Rprintf
> statements into isBlankString and for type.convert("\247")
> it printed
>   *s=3D-89 (4294967207 if unsigned)
>     8=3Disspace(*s)
>     8=3Disspace((unsigned int)*s)
>     0=3Disspace((unsigned int)(unsigned char)*s)
> I think the 8 is the value of a random bit of memory.
> 
> When I converted S+ to use full 8-bit characters I ran
> into the same problem.  The is<class> macros in <ctype.h>
> all take unsigned int argument and if char was signed you had
> to do the double cast to avoid sign extension.  Whoever
> designed the interface either didn't worry about 8-bit characters
> or had chars that were unsigned by default.
> 
> It doesn't look like any of the isspace calls in R do
> this double casting.
> 
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com =20
> 
> > -----Original Message-----
> > From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]=20
> > Sent: Friday, April 10, 2009 2:50 PM
> > To: William Dunlap
> > Cc: R-bugs at r-project.org; Raberger, Stefan
> > Subject: Re: [Rd] type.convert (PR#13646)
> >=20
> > William Dunlap wrote:
> > > You may have to use
> > >   (unsigned int)(unsigned char)*s++
> > > instead of just
> > >   (unsigned int)*s++
> > > to avoid the sign extension.
> >=20
> > Thanks again,
> >=20
> > I probably won't be doing the change since I don't have a=20
> > Windows build=20
> > environment around, and I'm a bit superstitious about fixing=20
> > bugs that I=20
> > cannot see...
> >=20
> > Let me just filter this information into the bug repository for now.
> >=20
> > 	-pd
> >=20
> > >=20
> > > Bill Dunlap
> > > TIBCO Software Inc - Spotfire Division
> > > wdunlap tibco.com =20
> > >=20
> > >> -----Original Message-----
> > >> From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]=20
> > >> Sent: Friday, April 10, 2009 1:41 PM
> > >> To: William Dunlap
> > >> Cc: r-devel at r-project.org
> > >> Subject: Re: [Rd] type.convert (PR#13646)
> > >>
> > >> William Dunlap wrote:
> > >>> I can reproduce the difference that Stefan saw, depending
> > >>> on whether or not I start Rgui with the flags
> > >>>     --no-environ --no-Rconsole
> > >>> I think it boils down to the isBlankString() function.
> > >>> For the string "\247" it returns 1 when those flags are
> > >>> not present and 0 when they are.  isBlankString does use
> > >>> some locale-specific functions:
> > >>> Rboolean isBlankString(const char *s)
> > >>> {
> > >>> #ifdef SUPPORT_MBCS
> > >>>     if(mbcslocale) {
> > >>>         wchar_t wc; int used; mbstate_t mb_st;
> > >>>         mbs_init(&mb_st);
> > >>>         while( (used =3D Mbrtowc(&wc, s, MB_CUR_MAX, 
> &mb_st)) ) {
> > >>>             if(!iswspace(wc)) return FALSE;
> > >>>             s +=3D used;
> > >>>         }
> > >>>     } else
> > >>> #endif
> > >>>         while (*s)
> > >>>             if (!isspace((int)*s++)) return FALSE;
> > >>>     return TRUE;
> > >>> }
> > >>>
> > >>> I was using R 2.8.1, downloaded precompiled from CRAN, 
> on Windows
> > >>> XP SP3. The outputs of sessionInfo() and Sys.getenv() 
> are the same
> > >>> in both sessions.  'Process Explorer' shows that the 2 sessions
> > >>> have the same dll's opened.
> > >> Thanks for that analysis Bill!
> > >>
> > >> Stefan was in "German_Austria.1252" which I don't think is=20
> > >> multibyte, so=20
> > >> only the else-clause should be relevant, pointing the=20
> > finger rather=20
> > >> squarely at isspace(). Googling indicates that others have=20
> > >> been caught=20
> > >> out by signed/unsigned char issues there. Should this=20
> > >> possibly rather read
> > >>
> > >> if (!isspace((unsigned int)*s++)) return FALSE;
> > >>
> > >> ??
> > >>
> > >>>> sessionInfo()
> > >>> R version 2.8.1 (2008-12-22)=20
> > >>> i386-pc-mingw32=20
> > >>>
> > >>> locale:
> > >>> LC_COLLATE=3DEnglish_United=20
> > >> States.1252;LC_CTYPE=3DEnglish_United=20
> > >> States.1252;LC_MONETARY=3DEnglish_United=20
> > >> States.1252;LC_NUMERIC=3DC;LC_TIME=3DEnglish_United States.1252
> > >>> attached base packages:
> > >>> [1] stats     graphics  grDevices utils     datasets =20
> > >> methods   base    =20
> > >>> I did the test with a dll compiled from
> > >>> #include <R.h>
> > >>> #include <R_ext/Utils.h>
> > >>>
> > >>> void test_isBlankString(char **s, int *res)
> > >>> {
> > >>>    *res =3D isBlankString(*s) ;
> > >>> }
> > >>>
> > >>> and called by .C("test_isBlankString","\247",-1L)
> > >>>
> > >>> I don't see the difference while running a version of 
> 2.9.0(devel)
> > >>> compiled locally on 11 March 2009 (from svn rev 48116).
> > >>>
> > >>> Bill Dunlap
> > >>> TIBCO Software Inc - Spotfire Division
> > >>> wdunlap tibco.com =20
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: r-devel-bounces at r-project.org=20
> > >>>> [mailto:r-devel-bounces at r-project.org] On Behalf Of=20
> > Peter Dalgaard
> > >>>> Sent: Friday, April 10, 2009 2:03 AM
> > >>>> To: Raberger, Stefan
> > >>>> Cc: R-bugs at r-project.org; r-devel at stat.math.ethz.ch
> > >>>> Subject: Re: [Rd] type.convert (PR#13646)
> > >>>>
> > >>>> Raberger, Stefan wrote:
> > >>>>> Hi Peter,
> > >>>>>
> > >>>>> each of the four PCs actually has the same locale setting:=20
> > >>>>>
> > >>>>>> Sys.setlocale("LC_CTYPE")
> > >>>>> [1] "German_Austria.1252"
> > >>>>>
> > >>>>> (all the other settings returned by invoking=20
> > >>>> Sys.getlocale() are identical as well).
> > >>>>> Just to be sure (because it's displayed incorrectly in my=20
> > >>>> browser on the bugtracking page): the character inside the=20
> > >>>> type.convert function ought to be a "section"-sign 
> (HTML Code=20
> > >>>> &#167; or &sect; , in R "\247", and not a dot ".").
> > >>>>
> > >>>> I saw it correctly. It's "\302\247" in UTF8 locales, 
> which is=20
> > >>>> of course=20
> > >>>> the reason I suspected locale settings, but I can't seem to=20
> > >>>> trigger the=20
> > >>>> NA behaviour.
> > >>>>
> > >>>> I'm at a loss here, but some ideas:
> > >>>>
> > >>>> In the cases where it returns NA, what type is it? (I.e.=20
> > >>>> storage.mode(type.convert(....)))
> > >>>>
> > >>>> What do you get from
> > >>>>
> > >>>>  > charToRaw("=A7")
> > >>>> [1] c2 a7
> > >>>>
> > >>>> (a7, presumably, but better check).
> > >>>>
> > >>>> -p
> > >>>>
> > >>>>> -----Urspr=FCngliche Nachricht-----
> > >>>>> Von: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]=20
> > >>>>> Gesendet: Donnerstag, 09. April 2009 19:26
> > >>>>> An: Raberger, Stefan
> > >>>>> Cc: r-devel at stat.math.ethz.ch; R-bugs at r-project.org
> > >>>>> Betreff: Re: [Rd] type.convert (PR#13646)
> > >>>>>
> > >>>>> s.raberger at innovest.at wrote:
> > >>>>>> Full_Name: Stefan Raberger
> > >>>>>> Version: 2.8.1
> > >>>>>> OS: Windows XP
> > >>>>>> Submission from: (NULL) (213.185.163.242)
> > >>>>>>
> > >>>>>>
> > >>>>>> Hi there,=20
> > >>>>>>
> > >>>>>> I recently noticed some strange behaviour of the command=20
> > >>>> "type.convert",
> > >>>>>> depending on the startup mode used. But there also seems=20
> > >>>> to be different
> > >>>>>> behaviour on different PCs (all running the same OS and=20
> > >>>> the same version of R).
> > >>>>>> On PC1:
> > >>>>>> When I start R in SDI mode (RGui --no-save --no-restore=20
> > >>>> --no-site-file
> > >>>>>> --no-init-file --no-environ) and try to convert, the 
> result is
> > >>>>>>
> > >>>>>>> type.convert("=A7")
> > >>>>>> [1] NA
> > >>>>>>
> > >>>>>> If I use MDI mode (RGui --no-save --no-restore=20
> > >>>> --no-site-file --no-init-file
> > >>>>>> --no-environ --no-Rconsole) instead, the result is
> > >>>>>>
> > >>>>>>> type.convert("=A7")
> > >>>>>> [1] =A7
> > >>>>>> Levels: =A7
> > >>>>>>
> > >>>>>> On PC2 it's exactly the other way round (SDI: =A7, 
> MDI: NA),=20
> > >>>> on PC2 the result is
> > >>>>>> always NA, independent of the startup mode used, and on=20
> > >>>> PC4 it's always =A7.
> > >>>>>> What's the result I should expect R to return, and why is=20
> > >>>> it different in so
> > >>>>>> many cases?
> > >>>>> Which locale does R think it is in in the four cases?=20
> > >>>>> (Sys.setlocale("LC_CTYPE"), I think).
> > >>>>>
> > >>>>> Might well not be a bug (so please don't file it as one).
> > >>>>>
> > >>>>>> Any help is much appreciated!
> > >>>>>> Regards, Stefan
> > >>>>>>
> > >>>>>> ______________________________________________
> > >>>>>> R-devel at r-project.org mailing list
> > >>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> > >>>> --=20
> > >>>>     O__  ---- Peter Dalgaard             =D8ster=20
> > >> Farimagsgade 5, Entr.B
> > >>>>    c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 
> 1014 Cph. K
> > >>>>   (*) \(*) -- University of Copenhagen   Denmark      Ph: =20
> > >>>> (+45) 35327918
> > >>>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX:=20
> > >>>> (+45) 35327907
> > >>>>
> > >>>> ______________________________________________
> > >>>> R-devel at r-project.org mailing list
> > >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> > >>>>
> > >>
> > >> --=20
> > >>     O__  ---- Peter Dalgaard             =D8ster=20
> > Farimagsgade 5, Entr.B
> > >>    c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
> > >>   (*) \(*) -- University of Copenhagen   Denmark      Ph: =20
> > >> (+45) 35327918
> > >> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX:=20
> > >> (+45) 35327907
> > >>
> >=20
> >=20
> > --=20
> >     O__  ---- Peter Dalgaard             =D8ster Farimagsgade 5, =
> Entr.B
> >    c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
> >   (*) \(*) -- University of Copenhagen   Denmark      Ph: =20
> > (+45) 35327918
> > ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX:=20
> > (+45) 35327907
> >=20
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list