[R] encoding accentsand tildes in R Macosx

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Aug 11 09:31:25 CEST 2008


On Mon, 11 Aug 2008, Kenneth Roy Cabrera Torres wrote:

> Hi Carlos:
>
> I think you got a encoding problem.
> Maybe is esier to convert it.
>
> I don't know how to convert in Mac OS, but
> in linux you can use "iconv" that converts many codes
> to other.

Well, R has an iconv() command even on Mac OS X, and my iMac has 'iconv' 
as a command-line program.  But you need to know what to convert from and 
to.

> Is the original file form a windos$ OS system?
> Maybe the encoding is in windows-1256 and you need
> to convert to a compatible MAC enconding.

Hmm, in latin1 (the most plausible Windows encoding) \x92 is a quote and 
\x96 is an en dash.  1256 is Arabic.

I think this is a MAC encoding, an obsolete one (Mac OS X in the main uses 
UTF-8).  Try encoding="macroman".

However, if you read ?read.table, you will see that *its* encoding 
argument does not re-encode.  You want

con <- file(<filename>, encoding="macroman")
tmp <- read.table(con, ...)
close(file)

There's an example on ?file (as 'encoding' in ?read.table says).



>
> Hope this helps.
>
> Kennneth
> El dom, 10-08-2008 a las 22:14 -0700, Carlos Cuartas escribió:
>> Hello,

>> In R under Mac OS X 10.5.4 I've had problems when I've tried to read a 
>> data.frame with characters including tildes and accents. For instance 
>> Florea is changed to Flore\x96a and Ranchera is changed to Rancher\x92a 
>> In the code: 
>> section<-read.table('Sectiondic.txt',sep='\t',header=T,stringsAsFactors=F,encoding=" 
>> ") I've changed the "encoding" argument but I have not could find the 
>> solution.

>> Any suggestion?
>>
>> Thanks a lot
>>
>> Carlos Cuartas

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-help mailing list