# [Rd] as.numeric(levels(factor(x))) may be a decreasing sequence

Martin Maechler maechler at stat.math.ethz.ch
Wed May 27 22:51:38 CEST 2009

```>>>>> "PS" == Petr Savicky <savicky at cs.cas.cz>
>>>>>     on Sat, 23 May 2009 09:44:54 +0200 writes:

PS> Function factor() in the current development version
PS> (2009-05-22) guarantees that levels are different
PS> character strings. However, they may represent the same
PS> decimal number. The following example is derived from a
PS> posting by Stavros Macrakis in thread "Match .3 in a
PS> sequence" in March

PS>   nums <- 0.3 + 2e-16 * c(-2,-1,1,2) f <- factor(nums)
PS> levels(f) #  "0.300000000000000" "0.3"

PS> The levels differ in trailing zeros, but represent the
PS> same decimal number.  Besides that this is not really
PS> meaningful, it may cause a problem, when using
PS> as.numeric(levels(f)).

PS> In the above case, as.numeric() works fine and maps the
PS> two levels to the same number. However, there are cases,
PS> where the difference in trailing zeros implies different
PS> values in as.numeric(levels(f)) and these values may
PS> even form a decreasing sequence although levels were
PS> constructed from an increasing sequence of numbers.

PS> Examples are platform dependent, but may be found by the
PS> following code.  Tested on Intel under Linux (both with
PS> and without SSE) and also under Windows with an older
PS> version of R.

PS> for (i in 1:100000) {
PS> x <- 10^(floor(runif(1, 61, 63)) + runif(1)/2)
PS> x <- as.numeric(sprintf("%.14g", x))
PS> eps <- 2^(floor(log2(x)) - 52)
PS> k <- round(x * c(5e-16, 1e-15) / eps)
PS> if (x > 1e62) { k <- rev( - k) }
PS> y <- x + k:k * eps
PS> ind <- which(diff(as.numeric(as.character(y))) < 0)
PS> for (j in ind) {
PS> u1 <- y[c(j, j+1)]
PS> u2 <- factor(u1)
PS> print(levels(u2))
PS> print(diff(as.numeric(levels(u2))))
PS> aux <- readline("next")
PS> }
PS> }

PS> An example of the output is

PS>  "1.2296427920313e+61"  "1.22964279203130e+61"
PS>  -1.427248e+45
PS> next
PS>  "1.82328862326830e+62" "1.8232886232683e+62"
PS>  -2.283596e+46
PS> next

PS> The negative number in diff(as.numeric(levels(u2))) demonstrates cases,
PS> when as.numeric(levels(u2)) is decreasing. We can also see that the reason
PS> is that the two strings in levels(u2) differ in the trailing zeros.

PS> I did quite intensive search for such examples for all possible exponents
PS> (not only 61 and 62 and a week of CPU on three processors) and all the obtained
PS> examples were caused by a difference in trailing zeros. So, i believe that
PS> removing trailing zeros from the output of as.character(x) solves the problem
PS> with the reversed order in as.numeric(levels(factor(x))) entirely.

PS> A patch against R-devel_2009-05-22, which eliminates trailing zeros
PS> from as.character(x), but makes no other changes to as.character(x),
PS> is in an attachment. Using the patch, we obtain a better result also
PS> in the following.

PS> nums <- 0.3 + 2e-16 * c(-2,-1,1,2)
PS> factor(nums)
PS> #  0.3 0.3 0.3 0.3
PS> # Levels: 0.3

yes, of course.

PS> Petr.

[.....]

I have very slightly  modified the changes (to get rid of -Wall
warnings) and also exported the function as Rf_dropTrailing0(),
and tested the result with 'make check-all' .
As the change seems reasonable and consequent, and as
it seems not to produce any problems in our tests,
I'm hereby proposing to commit it (my version of it),
[to R-devel only] within a few days,
unless someone speaks up.

Martin Maechler, ETH Zurich

PS> --- R-devel/src/main/coerce.c	2009-04-17 17:53:35.000000000 +0200
PS> +++ R-devel-elim-trailing/src/main/coerce.c	2009-05-23 08:39:03.914774176 +0200
PS> @@ -294,12 +294,33 @@
PS> else return mkChar(EncodeInteger(x, w));
PS> }

PS> +const char *elim_trailing(const char *s, char cdec)
PS> +{
PS> +    const char *p;
PS> +    char *replace;
PS> +    for (p = s; *p; p++) {
PS> +        if (*p == cdec) {
PS> +            replace = (char *) p++;
PS> +            while ('0' <= *p & *p <= '9') {
PS> +                if (*(p++) != '0') {
PS> +                    replace = (char *) p;
PS> +                }
PS> +            }
PS> +            while (*(replace++) = *(p++)) {
PS> +                ;
PS> +            }
PS> +            break;
PS> +        }
PS> +    }
PS> +    return s;
PS> +}
PS> +
PS> SEXP attribute_hidden StringFromReal(double x, int *warn)
PS> {
PS> int w, d, e;
PS> formatReal(&x, 1, &w, &d, &e, 0);
PS> if (ISNA(x)) return NA_STRING;
PS> -    else return mkChar(EncodeReal(x, w, d, e, OutDec));
PS> +    else return mkChar(elim_trailing(EncodeReal(x, w, d, e, OutDec), OutDec));
PS> }

PS> SEXP attribute_hidden StringFromComplex(Rcomplex x, int *warn)

```