[Rd] as.numeric and as.character with locale using comma as separator

Claudia Beleites claudia.beleites at ipht-jena.de
Tue Aug 14 19:00:11 CEST 2012


Dear all,

summary:

My LC_NUMERIC is changed from C to de_DE by library (qtbase).
[which shouldn't happen according to the warning when setting it back
manually].
I posted an issue at their github repository, but maybe the behaviour is
a bit more of general interest.

However, if LC_NUMERIC is changed, as.character () uses the decimal
separator that belongs to LC_NUMERIC (and not options ()$OutDec as I
supposed).
as.double () (= as.numeric ()) doesn't, though.

That causes trouble with constructs like
as.numeric (as.character (x))

long version:

as.character seems to take into account my locale (de_DE) which uses
comma as decimal separator:

> x <- rnorm (3)
> x
[1] -0,004238328 -0,919358537 -1,654543297
> as.character(x)
[1] "-0,00423832753479965" "-0,919358536523751"   "-1,65454329680873"

whereas as.numeric () doesn't:

> as.numeric (as.character(x))
[1] NA NA NA
Warnmeldung:
NAs durch Umwandlung erzeugt

> as.numeric (gsub (",", ".", as.character(x)))
[1] -0,004238328 -0,919358537 -1,654543297


I did not see any mention in the help of as.numeric nor as.character of
this.

Note also the output of example (as.character):
> example (as.character)

as.chr> form <- y ~ a + b + c

as.chr> as.character(form)  ## length 3
[1] "~"         "y"         "a + b + c"

as.chr> deparse(form)       ## like the input
[1] "y ~ a + b + c"

as.chr> a0 <- 11/999          # has a repeating decimal representation

as.chr> (a1 <- as.character(a0))
[1] "0,011011011011011"

as.chr> format(a0, digits=16) # shows one more digit
[1] "0,01101101101101101"

as.chr> a2 <- as.numeric(a1)

as.chr> a2 - a0               # normally around -1e-17
[1] NA

as.chr> as.character(a2)      # normally different from a1
[1] NA

as.chr> print(c(a0, a2), digits = 16)
[1] 0,01101101101101101                  NA
Warnmeldung:
In eval(expr, envir, enclos) : NAs durch Umwandlung erzeugt

*session info*
> sessionInfo ()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
[1] de_DE.UTF-8

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] Hmisc_3.9-3      survival_2.36-14 plumbr_0.6.6     cranvas_0.8
[5] maps_2.2-6       scales_0.2.1     qtpaint_0.9.0    qtbase_1.0.5
[9] idendro_1.0

loaded via a namespace (and not attached):
 [1] cluster_1.14.2         colorspace_1.1-1       dichromat_1.2-4
 [4] grid_2.15.1            labeling_0.1           lattice_0.20-6
 [7] munsell_0.3            objectProperties_0.6.5 objectSignals_0.10.2
[10] plyr_1.7.1             RColorBrewer_1.0-5     SearchTrees_0.5.1
[13] stringr_0.6            tools_2.15.1           tourr_0.5.2


Note that

> options ()$OutDec
[1] "."

In fresh R sessions I have

locale:
 [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C

It seems qtbase is the culprit:

> x
[1] -0.2290188 -0.1884703  0.2507179
> library (qtbase)
> x
[1] -0,2290188 -0,1884703  0,2507179



After setting the numeric locale back to C:
> Sys.setlocale ("LC_NUMERIC", "C")
[1] "C"
Warnmeldung:
In Sys.setlocale("LC_NUMERIC", "C") :
  das Setzen von 'LC_NUMERIC' kann bewirken, dass R sich komisch benimmt

as.numeric (as.character (x)) works as supposed (also output has decimal
points again)


Best,

Claudia





-- 
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.beleites at ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399



More information about the R-devel mailing list