[Rd] as.numeric and as.character with locale using comma as separator
Claudia Beleites
claudia.beleites at ipht-jena.de
Tue Aug 14 19:00:11 CEST 2012
Dear all,
summary:
My LC_NUMERIC is changed from C to de_DE by library (qtbase).
[which shouldn't happen according to the warning when setting it back
manually].
I posted an issue at their github repository, but maybe the behaviour is
a bit more of general interest.
However, if LC_NUMERIC is changed, as.character () uses the decimal
separator that belongs to LC_NUMERIC (and not options ()$OutDec as I
supposed).
as.double () (= as.numeric ()) doesn't, though.
That causes trouble with constructs like
as.numeric (as.character (x))
long version:
as.character seems to take into account my locale (de_DE) which uses
comma as decimal separator:
> x <- rnorm (3)
> x
[1] -0,004238328 -0,919358537 -1,654543297
> as.character(x)
[1] "-0,00423832753479965" "-0,919358536523751" "-1,65454329680873"
whereas as.numeric () doesn't:
> as.numeric (as.character(x))
[1] NA NA NA
Warnmeldung:
NAs durch Umwandlung erzeugt
> as.numeric (gsub (",", ".", as.character(x)))
[1] -0,004238328 -0,919358537 -1,654543297
I did not see any mention in the help of as.numeric nor as.character of
this.
Note also the output of example (as.character):
> example (as.character)
as.chr> form <- y ~ a + b + c
as.chr> as.character(form) ## length 3
[1] "~" "y" "a + b + c"
as.chr> deparse(form) ## like the input
[1] "y ~ a + b + c"
as.chr> a0 <- 11/999 # has a repeating decimal representation
as.chr> (a1 <- as.character(a0))
[1] "0,011011011011011"
as.chr> format(a0, digits=16) # shows one more digit
[1] "0,01101101101101101"
as.chr> a2 <- as.numeric(a1)
as.chr> a2 - a0 # normally around -1e-17
[1] NA
as.chr> as.character(a2) # normally different from a1
[1] NA
as.chr> print(c(a0, a2), digits = 16)
[1] 0,01101101101101101 NA
Warnmeldung:
In eval(expr, envir, enclos) : NAs durch Umwandlung erzeugt
*session info*
> sessionInfo ()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] de_DE.UTF-8
attached base packages:
[1] splines stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] Hmisc_3.9-3 survival_2.36-14 plumbr_0.6.6 cranvas_0.8
[5] maps_2.2-6 scales_0.2.1 qtpaint_0.9.0 qtbase_1.0.5
[9] idendro_1.0
loaded via a namespace (and not attached):
[1] cluster_1.14.2 colorspace_1.1-1 dichromat_1.2-4
[4] grid_2.15.1 labeling_0.1 lattice_0.20-6
[7] munsell_0.3 objectProperties_0.6.5 objectSignals_0.10.2
[10] plyr_1.7.1 RColorBrewer_1.0-5 SearchTrees_0.5.1
[13] stringr_0.6 tools_2.15.1 tourr_0.5.2
Note that
> options ()$OutDec
[1] "."
In fresh R sessions I have
locale:
[1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
[3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
It seems qtbase is the culprit:
> x
[1] -0.2290188 -0.1884703 0.2507179
> library (qtbase)
> x
[1] -0,2290188 -0,1884703 0,2507179
After setting the numeric locale back to C:
> Sys.setlocale ("LC_NUMERIC", "C")
[1] "C"
Warnmeldung:
In Sys.setlocale("LC_NUMERIC", "C") :
das Setzen von 'LC_NUMERIC' kann bewirken, dass R sich komisch benimmt
as.numeric (as.character (x)) works as supposed (also output has decimal
points again)
Best,
Claudia
--
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany
email: claudia.beleites at ipht-jena.de
phone: +49 3641 206-133
fax: +49 2641 206-399
More information about the R-devel
mailing list