[Rd] as.numeric(levels(factor(x))) may be a decreasing sequence
Martin Maechler
maechler at stat.math.ethz.ch
Wed May 27 22:51:38 CEST 2009
>>>>> "PS" == Petr Savicky <savicky at cs.cas.cz>
>>>>> on Sat, 23 May 2009 09:44:54 +0200 writes:
PS> Function factor() in the current development version
PS> (2009-05-22) guarantees that levels are different
PS> character strings. However, they may represent the same
PS> decimal number. The following example is derived from a
PS> posting by Stavros Macrakis in thread "Match .3 in a
PS> sequence" in March
PS> nums <- 0.3 + 2e-16 * c(-2,-1,1,2) f <- factor(nums)
PS> levels(f) # [1] "0.300000000000000" "0.3"
PS> The levels differ in trailing zeros, but represent the
PS> same decimal number. Besides that this is not really
PS> meaningful, it may cause a problem, when using
PS> as.numeric(levels(f)).
PS> In the above case, as.numeric() works fine and maps the
PS> two levels to the same number. However, there are cases,
PS> where the difference in trailing zeros implies different
PS> values in as.numeric(levels(f)) and these values may
PS> even form a decreasing sequence although levels were
PS> constructed from an increasing sequence of numbers.
PS> Examples are platform dependent, but may be found by the
PS> following code. Tested on Intel under Linux (both with
PS> and without SSE) and also under Windows with an older
PS> version of R.
PS> for (i in 1:100000) {
PS> x <- 10^(floor(runif(1, 61, 63)) + runif(1)/2)
PS> x <- as.numeric(sprintf("%.14g", x))
PS> eps <- 2^(floor(log2(x)) - 52)
PS> k <- round(x * c(5e-16, 1e-15) / eps)
PS> if (x > 1e62) { k <- rev( - k) }
PS> y <- x + k[1]:k[2] * eps
PS> ind <- which(diff(as.numeric(as.character(y))) < 0)
PS> for (j in ind) {
PS> u1 <- y[c(j, j+1)]
PS> u2 <- factor(u1)
PS> print(levels(u2))
PS> print(diff(as.numeric(levels(u2))))
PS> aux <- readline("next")
PS> }
PS> }
PS> An example of the output is
PS> [1] "1.2296427920313e+61" "1.22964279203130e+61"
PS> [1] -1.427248e+45
PS> next
PS> [1] "1.82328862326830e+62" "1.8232886232683e+62"
PS> [1] -2.283596e+46
PS> next
PS> The negative number in diff(as.numeric(levels(u2))) demonstrates cases,
PS> when as.numeric(levels(u2)) is decreasing. We can also see that the reason
PS> is that the two strings in levels(u2) differ in the trailing zeros.
PS> I did quite intensive search for such examples for all possible exponents
PS> (not only 61 and 62 and a week of CPU on three processors) and all the obtained
PS> examples were caused by a difference in trailing zeros. So, i believe that
PS> removing trailing zeros from the output of as.character(x) solves the problem
PS> with the reversed order in as.numeric(levels(factor(x))) entirely.
PS> A patch against R-devel_2009-05-22, which eliminates trailing zeros
PS> from as.character(x), but makes no other changes to as.character(x),
PS> is in an attachment. Using the patch, we obtain a better result also
PS> in the following.
PS> nums <- 0.3 + 2e-16 * c(-2,-1,1,2)
PS> factor(nums)
PS> # [1] 0.3 0.3 0.3 0.3
PS> # Levels: 0.3
yes, of course.
PS> Petr.
[.....]
I have very slightly modified the changes (to get rid of -Wall
warnings) and also exported the function as Rf_dropTrailing0(),
and tested the result with 'make check-all' .
As the change seems reasonable and consequent, and as
it seems not to produce any problems in our tests,
I'm hereby proposing to commit it (my version of it),
[to R-devel only] within a few days,
unless someone speaks up.
Martin Maechler, ETH Zurich
PS> --- R-devel/src/main/coerce.c 2009-04-17 17:53:35.000000000 +0200
PS> +++ R-devel-elim-trailing/src/main/coerce.c 2009-05-23 08:39:03.914774176 +0200
PS> @@ -294,12 +294,33 @@
PS> else return mkChar(EncodeInteger(x, w));
PS> }
PS> +const char *elim_trailing(const char *s, char cdec)
PS> +{
PS> + const char *p;
PS> + char *replace;
PS> + for (p = s; *p; p++) {
PS> + if (*p == cdec) {
PS> + replace = (char *) p++;
PS> + while ('0' <= *p & *p <= '9') {
PS> + if (*(p++) != '0') {
PS> + replace = (char *) p;
PS> + }
PS> + }
PS> + while (*(replace++) = *(p++)) {
PS> + ;
PS> + }
PS> + break;
PS> + }
PS> + }
PS> + return s;
PS> +}
PS> +
PS> SEXP attribute_hidden StringFromReal(double x, int *warn)
PS> {
PS> int w, d, e;
PS> formatReal(&x, 1, &w, &d, &e, 0);
PS> if (ISNA(x)) return NA_STRING;
PS> - else return mkChar(EncodeReal(x, w, d, e, OutDec));
PS> + else return mkChar(elim_trailing(EncodeReal(x, w, d, e, OutDec), OutDec));
PS> }
PS> SEXP attribute_hidden StringFromComplex(Rcomplex x, int *warn)
More information about the R-devel
mailing list