[Rd] Using unicode from C interface of R
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Jan 22 06:14:48 CET 2014
On 22/01/2014 00:08, Duncan Murdoch wrote:
> On 14-01-21 5:41 PM, Sandip Nandi wrote:
>> Hi ,
>>
>> I am using C interface of R . If a unicode string is read , in what
>> format
>> I could pass it back to R ?
>> I was trying to use the following
>>
>> tpStr = ( char *)val;
>> SET_STRING_ELT(innerList , 0, mkChar(tpStr));
>>
>> It does not work .
>>
>> If I pass it back from as RAW format to R , what package is there to read
>> it ? I mean package for interpreting RAW data .
>
> There are a number of encodings for Unicode. Most Unix systems use
> UTF-8, Windows uses UTF-16 for some things, etc.
>
> If your string is known to be in UTF-8 that's easiest: just use
> mkCharCE instead of mkChar, as described in Writing R Extensions. If it
> is in UTF-16 you might have more trouble because of possible embedded 0
> bytes. Translate to UTF-8 first using C facilities like
> WideCharToMultibyte.
Which is Windows-only (and 'wide char' differs by platform, including if
it is known to be any Unicode encoding) All platforms have Riconv: see
'Writing R Extensions'. C11 has other ways to do this, but they are not
widely implemented.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list