[R] Using unicode symbol has unexpected results in levels of factor object

peter dalgaard pdalgd at gmail.com
Thu Aug 9 11:02:49 CEST 2012


On Aug 9, 2012, at 06:53 , Wyatt, Kristin M wrote:

> Dear all,
> 
> When I use a unicode symbol in the labels for a factor object, the corresponding level does not display as expected. However, using levels() on the factor returns the desired output. I noticed the discrepancy when the legend labels from a call to ggplot() did not display the desired symbol, but an explicitly built legend using the same labels did. 
> 
> Example (I am trying to get the less than or equal to symbol): 
> 
>> .df <- data.frame(afp = c(0,0,1,1), time=c(0,2,0,1), surv=c(1, 0.5, 1, 0.4))
>> afpLabels <- c("AFP \u2264 16", "AFP > 16")
>> afpStrata <- factor(.df$afp, labels=afpLabels)
>> afpStrata
> [1] AFP ? 16 AFP ? 16 AFP > 16 AFP > 16
> Levels: AFP = 16 AFP > 16
> 
> The first level is reported as "AFP = 16".
> 
>> levels(afpStrata)
> [1] "AFP ? 16" "AFP > 16"
>> 
> 
> The desired result is produced with levels().
> 
> 
> The code below shows this issue in context through calls to ggplot() if you don't mind loading all the libraries.
> 
>> library(ggplot2)
>> library(gridExtra)
>> library(plyr)
>> 
>> ggplot(.df, aes(time, surv)) + geom_step(aes(color = afpStrata), size = 1.0)  
>> 
>> ggplot(.df, aes(time, surv)) + geom_step(aes(color = afpStrata), size = 1.0)  +  
> + scale_colour_hue(breaks=afpLabels, labels=afpLabels)
>> 
> 
> I am running a pre-compiled version of R on Windows 7 (64-bit).
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-pc-mingw32/x64 (64-bit)

For whatever it is worth, this works fine (both examples) under OSX Snow Leopard.

Looking at the code for print.factor, I would strongly suspect that the culprit is the line 

        n <- length(lev <- encodeString(levels(x), quote = ifelse(quote, 
            "\"", "")))

which figures since you are in a .1252 locale, not .utf8 (or UTF-8 or ...). 

Over to the Windows/locale/charset experts...

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list