[R] Multibyte strings
David Winsemius
dwinsemius at comcast.net
Sat Sep 26 00:20:27 CEST 2015
On Sep 25, 2015, at 2:23 PM, Dennis Fisher wrote:
> R 3.2.0
> OS X
>
> Colleagues,
>
> Earlier today, I initiated a series of emails regarding SASxport (which was removed from CRAN). David Winsemius proposed downloading the source code and installing with the following command:
> install.packages('~/Downloads/SASxport_1.5.0.tar.gz', repos = NULL , type="source”)Th
>
> That works and I am grateful to David for his recommendation. However, the package fails on some of the many objects that I attempted to write with:
> write.xport
>
> The error message was:
> Error in nchar(var) : invalid multibyte string 3157
Consider using traceback() to see what section of code is actually reporting?
Since the error reported in your earlier message indicated a problem with a particular word starting with DIARRH and ending in æéñåºA. When I try to drop that unquoted into an R console line I get:
> DIARRH¸æéñåºA
Error: unexpected input in "DIARRH¬"
My word process tells me that little comma-like glyph is a cedilla.
However I'm not sure this is reproducible problem since I am unable to produce a similar error with the toy file that is built with the write.xport help page code:
> abc <- data.frame( x=c(1, 2, NA, NA ), y=c('a', 'DIARRH¸æéñåºA', NA, '*' ) )
> abc
x y
1 1 a
2 2 DIARRH¸æéñåºA
3 NA <NA>
4 NA *
> SASformat(abc$x) <- 'date7.'
> label(abc$y) <- 'character variable'
> label(abc) <- 'Simple example'
> SAStype(abc) <- 'MYTYPE'
> str(abc)
'data.frame': 4 obs. of 2 variables:
$ x: atomic 1 2 NA NA
..- attr(*, "SASformat")= chr "date7."
$ y: Factor w/ 3 levels "*","a","DIARRH¸æéñåºA": 2 3 NA 1
..- attr(*, "label")= chr "character variable"
- attr(*, "label")= chr "Simple example"
- attr(*, "SAStype")= chr "MYTYPE"
> write.xport( abc, file="xxx.dat" )
> abc <- data.frame( x=c(1, 2, NA, NA ), y=c('a', 'DIARRH¸æéñåºA', NA, '*' ) )
> abc
x y
1 1 a
2 2 DIARRH¸æéñåºA
3 NA <NA>
4 NA *
> SASformat(abc$x) <- 'date7.'
> label(abc$y) <- '"DIARRH¸æéñåºA"'
> label(abc) <- 'Simple example'
> SAStype(abc) <- 'MYTYPE'
> str(abc)
'data.frame': 4 obs. of 2 variables:
$ x: atomic 1 2 NA NA
..- attr(*, "SASformat")= chr "date7."
$ y: Factor w/ 3 levels "*","a","DIARRH¸æéñåºA": 2 3 NA 1
..- attr(*, "label")= chr "\"DIARRH¸æéñåºA\""
- attr(*, "label")= chr "Simple example"
- attr(*, "SAStype")= chr "MYTYPE"
> write.xport( abc, file="xxx.dat" )
>
> One work-around would be to edit out multibyte strings. Is there a simple way to find and replace them?
On a Mac I have used the Zap Gremlins option in TextWrangler.app. It would change the spelling of words that were originally constructed using ligature characters.
Best of luck;
David.
> Or is there some other clever approach that bypasses the problem?
>
> Dennis
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list