[Rd] Using unicode from C interface of R

Wed Jan 22 06:14:48 CET 2014

On 22/01/2014 00:08, Duncan Murdoch wrote:
> On 14-01-21 5:41 PM, Sandip Nandi wrote:
>> Hi ,
>>
>> I am using C interface of R . If a unicode string is read , in what
>> format
>> I could pass it back to R ?
>> I was trying to use the following
>>
>>   tpStr = ( char *)val;
>>   SET_STRING_ELT(innerList  , 0, mkChar(tpStr));
>>
>> It does not work .
>>
>> If I pass it back from as RAW format to R , what package is there to read
>> it ? I mean package for interpreting RAW data .
>
> There are a number of encodings for Unicode.  Most Unix systems use
> UTF-8, Windows uses UTF-16 for some things, etc.
>
> If your string is known to be in UTF-8 that's easiest:  just use
> mkCharCE instead of mkChar, as described in Writing R Extensions.  If it
> is in UTF-16 you might have more trouble because of possible embedded 0
> bytes.  Translate to UTF-8 first using C facilities like
> WideCharToMultibyte.

Which is Windows-only (and 'wide char' differs by platform, including if 
it is known to be any Unicode encoding)   All platforms have Riconv: see 
'Writing R Extensions'. C11 has other ways to do this, but they are not 
widely implemented.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595