[Rd] type.convert (PR#13646)
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Fri Apr 10 22:40:54 CEST 2009
William Dunlap wrote:
> I can reproduce the difference that Stefan saw, depending
> on whether or not I start Rgui with the flags
> --no-environ --no-Rconsole
> I think it boils down to the isBlankString() function.
> For the string "\247" it returns 1 when those flags are
> not present and 0 when they are. isBlankString does use
> some locale-specific functions:
> Rboolean isBlankString(const char *s)
> {
> #ifdef SUPPORT_MBCS
> if(mbcslocale) {
> wchar_t wc; int used; mbstate_t mb_st;
> mbs_init(&mb_st);
> while( (used = Mbrtowc(&wc, s, MB_CUR_MAX, &mb_st)) ) {
> if(!iswspace(wc)) return FALSE;
> s += used;
> }
> } else
> #endif
> while (*s)
> if (!isspace((int)*s++)) return FALSE;
> return TRUE;
> }
>
> I was using R 2.8.1, downloaded precompiled from CRAN, on Windows
> XP SP3. The outputs of sessionInfo() and Sys.getenv() are the same
> in both sessions. 'Process Explorer' shows that the 2 sessions
> have the same dll's opened.
Thanks for that analysis Bill!
Stefan was in "German_Austria.1252" which I don't think is multibyte, so
only the else-clause should be relevant, pointing the finger rather
squarely at isspace(). Googling indicates that others have been caught
out by signed/unsigned char issues there. Should this possibly rather read
if (!isspace((unsigned int)*s++)) return FALSE;
??
>
>> sessionInfo()
> R version 2.8.1 (2008-12-22)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> I did the test with a dll compiled from
> #include <R.h>
> #include <R_ext/Utils.h>
>
> void test_isBlankString(char **s, int *res)
> {
> *res = isBlankString(*s) ;
> }
>
> and called by .C("test_isBlankString","\247",-1L)
>
> I don't see the difference while running a version of 2.9.0(devel)
> compiled locally on 11 March 2009 (from svn rev 48116).
>
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com
>
>> -----Original Message-----
>> From: r-devel-bounces at r-project.org
>> [mailto:r-devel-bounces at r-project.org] On Behalf Of Peter Dalgaard
>> Sent: Friday, April 10, 2009 2:03 AM
>> To: Raberger, Stefan
>> Cc: R-bugs at r-project.org; r-devel at stat.math.ethz.ch
>> Subject: Re: [Rd] type.convert (PR#13646)
>>
>> Raberger, Stefan wrote:
>>> Hi Peter,
>>>
>>> each of the four PCs actually has the same locale setting:
>>>
>>>> Sys.setlocale("LC_CTYPE")
>>> [1] "German_Austria.1252"
>>>
>>> (all the other settings returned by invoking
>> Sys.getlocale() are identical as well).
>>> Just to be sure (because it's displayed incorrectly in my
>> browser on the bugtracking page): the character inside the
>> type.convert function ought to be a "section"-sign (HTML Code
>> § or § , in R "\247", and not a dot ".").
>>
>> I saw it correctly. It's "\302\247" in UTF8 locales, which is
>> of course
>> the reason I suspected locale settings, but I can't seem to
>> trigger the
>> NA behaviour.
>>
>> I'm at a loss here, but some ideas:
>>
>> In the cases where it returns NA, what type is it? (I.e.
>> storage.mode(type.convert(....)))
>>
>> What do you get from
>>
>> > charToRaw("§")
>> [1] c2 a7
>>
>> (a7, presumably, but better check).
>>
>> -p
>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]
>>> Gesendet: Donnerstag, 09. April 2009 19:26
>>> An: Raberger, Stefan
>>> Cc: r-devel at stat.math.ethz.ch; R-bugs at r-project.org
>>> Betreff: Re: [Rd] type.convert (PR#13646)
>>>
>>> s.raberger at innovest.at wrote:
>>>> Full_Name: Stefan Raberger
>>>> Version: 2.8.1
>>>> OS: Windows XP
>>>> Submission from: (NULL) (213.185.163.242)
>>>>
>>>>
>>>> Hi there,
>>>>
>>>> I recently noticed some strange behaviour of the command
>> "type.convert",
>>>> depending on the startup mode used. But there also seems
>> to be different
>>>> behaviour on different PCs (all running the same OS and
>> the same version of R).
>>>> On PC1:
>>>> When I start R in SDI mode (RGui --no-save --no-restore
>> --no-site-file
>>>> --no-init-file --no-environ) and try to convert, the result is
>>>>
>>>>> type.convert("§")
>>>> [1] NA
>>>>
>>>> If I use MDI mode (RGui --no-save --no-restore
>> --no-site-file --no-init-file
>>>> --no-environ --no-Rconsole) instead, the result is
>>>>
>>>>> type.convert("§")
>>>> [1] §
>>>> Levels: §
>>>>
>>>> On PC2 it's exactly the other way round (SDI: §, MDI: NA),
>> on PC2 the result is
>>>> always NA, independent of the startup mode used, and on
>> PC4 it's always §.
>>>> What's the result I should expect R to return, and why is
>> it different in so
>>>> many cases?
>>> Which locale does R think it is in in the four cases?
>>> (Sys.setlocale("LC_CTYPE"), I think).
>>>
>>> Might well not be a bug (so please don't file it as one).
>>>
>>>> Any help is much appreciated!
>>>> Regards, Stefan
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> --
>> O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
>> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
>> (*) \(*) -- University of Copenhagen Denmark Ph:
>> (+45) 35327918
>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX:
>> (+45) 35327907
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-devel
mailing list