[R] Multibyte strings

Dennis Fisher fisher at plessthan.com
Sat Sep 26 14:16:20 CEST 2015


Peter

Thanks for the explanation.  One further comment — you wrote:
> I don't think the FDA "requests" XPT files 

In fact, they do make such a request.  Here is the actual language received this week (and repeatedly in the past):
> Program/script files should be submitted using text files (*.TXT) and the data should be submitted using SAS transport files (*.XPT).

Dennis

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com



> On Sep 26, 2015, at 5:52 AM, peter dalgaard <pdalgd at gmail.com> wrote:
> 
> Dennis,
> 
> The invalid multibyte issue is almost certainly a symptom of being in a UTF-8 locale and trying to handle strings that aren't in UTF-8. (UTF uses particular 8 bit patterns to say that the following k bytes contain a Unicode value outside ASCII, other "8 bit ASCII" encodings, like Latin-1, just use the extra 128 character codes for special characters. Treating the latter as the former causes errors, the other way around just looks weird.
> 
> So perhaps you should try diddling your locale settings and/or look for encoding arguments for the functions that you use. Then again, the XPT format may not be happy with non-ASCII characters, whatever the encoding, in which case you may need to massage the input data sets and change variable names and factor labels (iconv() should be your friend).
> 
> By the way, I don't think the FDA "requests" XPT files. As far as I recall, they say somewhere that they _accept_ them (possibly defending themselves against the platform-specific SAS files that once abunded), but I think even Excel goes for submissions - the important thing is that they can get at the actual data reasonably easy. I can see the attraction of taking the well-trodden path, though.
> 
> -pd
> 
>> On 25 Sep 2015, at 23:23 , Dennis Fisher <fisher at plessthan.com> wrote:
>> 
>> R 3.2.0
>> OS X
>> 
>> Colleagues,
>> 
>> Earlier today, I initiated a series of emails regarding SASxport (which was removed from CRAN).  David Winsemius proposed downloading the source code and installing with the following command:
>> 	install.packages('~/Downloads/SASxport_1.5.0.tar.gz', repos = NULL , type="source”)Th
>> 
>> That works and I am grateful to David for his recommendation.  However, the package fails on some of the many objects that I attempted to write with:
>> 	write.xport
>> 
>> The error message was:
>> 	Error in nchar(var) : invalid multibyte string 3157
>> 
>> One work-around would be to edit out multibyte strings.  Is there a simple way to find and replace them?  Or is there some other clever approach that bypasses the problem?
>> 
>> Dennis
>> 
>> Dennis Fisher MD
>> P < (The "P Less Than" Company)
>> Phone: 1-866-PLessThan (1-866-753-7784)
>> Fax: 1-866-PLessThan (1-866-753-7784)
>> www.PLessThan.com
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
> 
> 
> 
> 
> 
> 
> 
> 



More information about the R-help mailing list