[Rd] wchar and wstring. (followup question)

James Bullard bullard at berkeley.edu
Tue Aug 30 02:08:46 CEST 2005


Thanks for all of the help with my previous posts. This question might
expose much of my ignorance when it comes to R's memory managment and
the responsibilities of the programmer, but I thought I had better ask
rather than continue in my ignorance.

I use the following code to create a multi-byte string in R from my wide
character string in C.

int str_length;
char* cstr;
   
str_length = cel.GetAlg().size();
cstr = Calloc(str_length, char);
wcstombs(cstr, cel.GetAlg().c_str(), str_length);
SET_STRING_ELT(names, i, mkChar("Algorithm"));
SET_VECTOR_ELT(vals, i++, mkString(cstr));
Free(cstr);

My first question is: do I need the Free? I looked at some of the
examples in main/character.c, but I could not decide whether or not I
needed it. I imagined (I could not find the source for this function)
that mkString made a copy so I thought I would clean up my copy, but if
this is not the case then I would assume the Free would be wrong.

My second question is: It was pointed out to me that it would be more
natural to use this code:

SET_STRING_ELT(vals, i++, mkChar(header.GetHeader().c_str()));

instead of:

SET_VECTOR_ELT(vals, i++, mkString(header.GetHeader().c_str()));

However, the first line creates the following list element in R:

<CHARSXP: "Percentile">

Whereas, I want it to create as the list element:

"Percentile"

Which the second example does correctly. I had previously posted about
this problem and I believe that I was advised to use the second syntax,
but maybe there is a different problem in my code. I am trying to
construct a named list in R where my first line SET_STRING_ELT sets the
name of the list element and the second sets the value where the value
can be an INTEGER, STRING or whatever.

My third question is simply, why is wcrtomb preferred, the example i
based my code of of in main/character.c used  wcstombs.


Thanks again for all of the help.

jim


Prof Brian Ripley wrote:

> On Fri, 26 Aug 2005, James Bullard wrote:
>
>> Hello all, I am writing an R interface to some C++ files which make use
>> of std::wstring classes for internationalization. Previously (when I
>> wanted to make R strings from C++ std::strings), I would do something
>> like this to construct a string in R from the results of the parse.
>>
>> SET_VECTOR_ELT(vals, i++, mkString(header.GetHeader().c_str()));
>
>
> That creates a list of one-element character vectors.  It would be more
> usual to do
>
>   SET_STRING_ELT(vals, i++, mkChar(header.GetHeader().c_str()));
>
>> However, now the call header.GetHeader().c_str() returns a pointer to
>> an array of wchar_t's. I was going to use wcstombs() to convert the
>> wchar_t* to char*, but I wanted to see if there was a similar
>> function in R for the mkString function which I had initially used
>> which deals with wchar_ts as opposed to chars.
>
>
> No (nor an analogue of mkChar).  R uses MBCS and not wchar_t
> internally (and Unix-alike systems do externally).  There is no
> wchar_t internal R type (a much-debated design decision at the time).
>
>> Also, since I have no experience with the wctombs() function I wanted
>> to ask if anyone knew if this will handle the internationilzation
>> issues from within R.
>
>
> Did you mean wcstombs or wctomb (if the latter, wcrtomb is preferred)?
> There are tens of examples in the R sources for you to consult.
>
> Note that not all R platforms support wchar_t, hence this code is
> surrounded by #ifdef SUPPORT_MBCS macros (exported in Rconfig.h for
> package writers).
>



More information about the R-devel mailing list