[R] Multibyte strings
fisher at plessthan.com
Sat Sep 26 14:16:20 CEST 2015
Thanks for the explanation. One further comment — you wrote:
> I don't think the FDA "requests" XPT files
In fact, they do make such a request. Here is the actual language received this week (and repeatedly in the past):
> Program/script files should be submitted using text files (*.TXT) and the data should be submitted using SAS transport files (*.XPT).
Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
> On Sep 26, 2015, at 5:52 AM, peter dalgaard <pdalgd at gmail.com> wrote:
> The invalid multibyte issue is almost certainly a symptom of being in a UTF-8 locale and trying to handle strings that aren't in UTF-8. (UTF uses particular 8 bit patterns to say that the following k bytes contain a Unicode value outside ASCII, other "8 bit ASCII" encodings, like Latin-1, just use the extra 128 character codes for special characters. Treating the latter as the former causes errors, the other way around just looks weird.
> So perhaps you should try diddling your locale settings and/or look for encoding arguments for the functions that you use. Then again, the XPT format may not be happy with non-ASCII characters, whatever the encoding, in which case you may need to massage the input data sets and change variable names and factor labels (iconv() should be your friend).
> By the way, I don't think the FDA "requests" XPT files. As far as I recall, they say somewhere that they _accept_ them (possibly defending themselves against the platform-specific SAS files that once abunded), but I think even Excel goes for submissions - the important thing is that they can get at the actual data reasonably easy. I can see the attraction of taking the well-trodden path, though.
>> On 25 Sep 2015, at 23:23 , Dennis Fisher <fisher at plessthan.com> wrote:
>> R 3.2.0
>> OS X
>> Earlier today, I initiated a series of emails regarding SASxport (which was removed from CRAN). David Winsemius proposed downloading the source code and installing with the following command:
>> install.packages('~/Downloads/SASxport_1.5.0.tar.gz', repos = NULL , type="source”)Th
>> That works and I am grateful to David for his recommendation. However, the package fails on some of the many objects that I attempted to write with:
>> The error message was:
>> Error in nchar(var) : invalid multibyte string 3157
>> One work-around would be to edit out multibyte strings. Is there a simple way to find and replace them? Or is there some other clever approach that bypasses the problem?
>> Dennis Fisher MD
>> P < (The "P Less Than" Company)
>> Phone: 1-866-PLessThan (1-866-753-7784)
>> Fax: 1-866-PLessThan (1-866-753-7784)
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help