[R] changing names with different character sets

Prof Brian Ripley ripley at stats.ox.ac.uk
Sun Feb 19 14:40:35 CET 2012


On 19/02/2012 12:43, peter dalgaard wrote:
>
> On Feb 19, 2012, at 08:49 , Prof Brian Ripley wrote:
>
>> On 19/02/2012 07:30, Erin Hodgess wrote:
>>> Dear R People:
>>>
>>> I'm trying to replicate something that I saw on an R blog.
>>>
>>> The first step is to load in the .rda file, which is fine.
>>>
>>> However, some of the names of the columns in the data frame have
>>> special characters, accents, and such.
>>
>> Most of the world think characters with accents are normal, not special.  The difference for R is going to be whether they are alphanumeric or not.
>>
>>> How do I get around this on a basic keyboard, please?
>>
>> Copy-and-paste from names(dataframe) may work.  But without an example or knowing your OS or your locale (but I remember you are in the US) it is hard to tell.
>>
>> The main issue is that what R regards as a valid name aka symbol depends on the locale, and so strictly in a US locale no non-ASCII characters are valid in names.  In practice US locales tend to be set up either for a Western European character set (Latin-1, cp1252) or so that all alphanumeric Unicode characters in a human language are regarded as alphanumeric.
>
>
> You could consider a strategy like this:
>
>> d<- data.frame(Æblefløde=1:2, Blåbærgrød=3:4)
>> d
>    Æblefløde Blåbærgrød
> 1         1          3
> 2         2          4
>> names(d)
> [1] "Æblefløde"  "Blåbærgrød"
>> iconv(names(d),to="ASCII//TRANSLIT")
> [1] "AEbleflode"  "Blabaergrod"
>> names(d)<- iconv(names(d),to="ASCII//TRANSLIT")
>> d
>    AEbleflode Blabaergrod
> 1          1           3
> 2          2           4
>
> (If the characters don't display correctly to begin with, you may
> need  to figure out the appropriate from= argument to iconv() as well.)

And for some languages transliteration does not work (and it is not 
supported at all under some versions of iconv).

We are all guessing, but the comment about a 'basic' keyboard suggested 
to me that the column names were used in some script.  If so, getting R 
to work with the original names may be the simplest alternative.


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list