[Rd] type.convert (PR#13646)
p.dalgaard at biostat.ku.dk
p.dalgaard at biostat.ku.dk
Fri Apr 10 23:55:25 CEST 2009
William Dunlap wrote:
> You may have to use
> (unsigned int)(unsigned char)*s++
> instead of just
> (unsigned int)*s++
> to avoid the sign extension.
Thanks again,
I probably won't be doing the change since I don't have a Windows build
environment around, and I'm a bit superstitious about fixing bugs that I
cannot see...
Let me just filter this information into the bug repository for now.
-pd
>
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com
>
>> -----Original Message-----
>> From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]
>> Sent: Friday, April 10, 2009 1:41 PM
>> To: William Dunlap
>> Cc: r-devel at r-project.org
>> Subject: Re: [Rd] type.convert (PR#13646)
>>
>> William Dunlap wrote:
>>> I can reproduce the difference that Stefan saw, depending
>>> on whether or not I start Rgui with the flags
>>> --no-environ --no-Rconsole
>>> I think it boils down to the isBlankString() function.
>>> For the string "\247" it returns 1 when those flags are
>>> not present and 0 when they are. isBlankString does use
>>> some locale-specific functions:
>>> Rboolean isBlankString(const char *s)
>>> {
>>> #ifdef SUPPORT_MBCS
>>> if(mbcslocale) {
>>> wchar_t wc; int used; mbstate_t mb_st;
>>> mbs_init(&mb_st);
>>> while( (used = Mbrtowc(&wc, s, MB_CUR_MAX, &mb_st)) ) {
>>> if(!iswspace(wc)) return FALSE;
>>> s += used;
>>> }
>>> } else
>>> #endif
>>> while (*s)
>>> if (!isspace((int)*s++)) return FALSE;
>>> return TRUE;
>>> }
>>>
>>> I was using R 2.8.1, downloaded precompiled from CRAN, on Windows
>>> XP SP3. The outputs of sessionInfo() and Sys.getenv() are the same
>>> in both sessions. 'Process Explorer' shows that the 2 sessions
>>> have the same dll's opened.
>> Thanks for that analysis Bill!
>>
>> Stefan was in "German_Austria.1252" which I don't think is
>> multibyte, so
>> only the else-clause should be relevant, pointing the finger rather
>> squarely at isspace(). Googling indicates that others have
>> been caught
>> out by signed/unsigned char issues there. Should this
>> possibly rather read
>>
>> if (!isspace((unsigned int)*s++)) return FALSE;
>>
>> ??
>>
>>>> sessionInfo()
>>> R version 2.8.1 (2008-12-22)
>>> i386-pc-mingw32
>>>
>>> locale:
>>> LC_COLLATE=English_United
>> States.1252;LC_CTYPE=English_United
>> States.1252;LC_MONETARY=English_United
>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets
>> methods base
>>> I did the test with a dll compiled from
>>> #include <R.h>
>>> #include <R_ext/Utils.h>
>>>
>>> void test_isBlankString(char **s, int *res)
>>> {
>>> *res = isBlankString(*s) ;
>>> }
>>>
>>> and called by .C("test_isBlankString","\247",-1L)
>>>
>>> I don't see the difference while running a version of 2.9.0(devel)
>>> compiled locally on 11 March 2009 (from svn rev 48116).
>>>
>>> Bill Dunlap
>>> TIBCO Software Inc - Spotfire Division
>>> wdunlap tibco.com
>>>
>>>> -----Original Message-----
>>>> From: r-devel-bounces at r-project.org
>>>> [mailto:r-devel-bounces at r-project.org] On Behalf Of Peter Dalgaard
>>>> Sent: Friday, April 10, 2009 2:03 AM
>>>> To: Raberger, Stefan
>>>> Cc: R-bugs at r-project.org; r-devel at stat.math.ethz.ch
>>>> Subject: Re: [Rd] type.convert (PR#13646)
>>>>
>>>> Raberger, Stefan wrote:
>>>>> Hi Peter,
>>>>>
>>>>> each of the four PCs actually has the same locale setting:
>>>>>
>>>>>> Sys.setlocale("LC_CTYPE")
>>>>> [1] "German_Austria.1252"
>>>>>
>>>>> (all the other settings returned by invoking
>>>> Sys.getlocale() are identical as well).
>>>>> Just to be sure (because it's displayed incorrectly in my
>>>> browser on the bugtracking page): the character inside the
>>>> type.convert function ought to be a "section"-sign (HTML Code
>>>> § or § , in R "\247", and not a dot ".").
>>>>
>>>> I saw it correctly. It's "\302\247" in UTF8 locales, which is
>>>> of course
>>>> the reason I suspected locale settings, but I can't seem to
>>>> trigger the
>>>> NA behaviour.
>>>>
>>>> I'm at a loss here, but some ideas:
>>>>
>>>> In the cases where it returns NA, what type is it? (I.e.
>>>> storage.mode(type.convert(....)))
>>>>
>>>> What do you get from
>>>>
>>>> > charToRaw("§")
>>>> [1] c2 a7
>>>>
>>>> (a7, presumably, but better check).
>>>>
>>>> -p
>>>>
>>>>> -----Ursprüngliche Nachricht-----
>>>>> Von: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]
>>>>> Gesendet: Donnerstag, 09. April 2009 19:26
>>>>> An: Raberger, Stefan
>>>>> Cc: r-devel at stat.math.ethz.ch; R-bugs at r-project.org
>>>>> Betreff: Re: [Rd] type.convert (PR#13646)
>>>>>
>>>>> s.raberger at innovest.at wrote:
>>>>>> Full_Name: Stefan Raberger
>>>>>> Version: 2.8.1
>>>>>> OS: Windows XP
>>>>>> Submission from: (NULL) (213.185.163.242)
>>>>>>
>>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> I recently noticed some strange behaviour of the command
>>>> "type.convert",
>>>>>> depending on the startup mode used. But there also seems
>>>> to be different
>>>>>> behaviour on different PCs (all running the same OS and
>>>> the same version of R).
>>>>>> On PC1:
>>>>>> When I start R in SDI mode (RGui --no-save --no-restore
>>>> --no-site-file
>>>>>> --no-init-file --no-environ) and try to convert, the result is
>>>>>>
>>>>>>> type.convert("§")
>>>>>> [1] NA
>>>>>>
>>>>>> If I use MDI mode (RGui --no-save --no-restore
>>>> --no-site-file --no-init-file
>>>>>> --no-environ --no-Rconsole) instead, the result is
>>>>>>
>>>>>>> type.convert("§")
>>>>>> [1] §
>>>>>> Levels: §
>>>>>>
>>>>>> On PC2 it's exactly the other way round (SDI: §, MDI: NA),
>>>> on PC2 the result is
>>>>>> always NA, independent of the startup mode used, and on
>>>> PC4 it's always §.
>>>>>> What's the result I should expect R to return, and why is
>>>> it different in so
>>>>>> many cases?
>>>>> Which locale does R think it is in in the four cases?
>>>>> (Sys.setlocale("LC_CTYPE"), I think).
>>>>>
>>>>> Might well not be a bug (so please don't file it as one).
>>>>>
>>>>>> Any help is much appreciated!
>>>>>> Regards, Stefan
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>> --
>>>> O__ ---- Peter Dalgaard Øster
>> Farimagsgade 5, Entr.B
>>>> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
>>>> (*) \(*) -- University of Copenhagen Denmark Ph:
>>>> (+45) 35327918
>>>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX:
>>>> (+45) 35327907
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>
>> --
>> O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
>> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
>> (*) \(*) -- University of Copenhagen Denmark Ph:
>> (+45) 35327918
>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX:
>> (+45) 35327907
>>
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-devel
mailing list