[Rd] suggestion for extending ?as.factor
Petr Savicky
savicky at cs.cas.cz
Thu May 7 09:20:26 CEST 2009
On Wed, May 06, 2009 at 10:41:58AM +0200, Martin Maechler wrote:
> PD> I think that the real issue is that we actually do want almost-equal
> PD> numbers to be folded together.
>
> yes, this now (revision 48469) will happen by default, using signif(x, 15)
> where '15' is the default for the new optional argument 'digitsLabels'
On some platforms, the function factor() in the current R 2.10.0
(2009-05-06 r48478) may produce duplicated levels. The examples are
in general platform dependent. The following one produces duplicated
(in fact triplicated) levels on both Intel default arithmetic and
on Intel with SSE.
x <- 9.7738826945424 + c(-1, 0, 1) * 1e-14
x <- signif(x, 15)
factor(x)
# [1] 9.7738826945424 9.7738826945424 9.7738826945424
# Levels: 9.7738826945424 9.7738826945424 9.7738826945424
# Warning message:
# In `levels<-`(`*tmp*`, value = c("9.7738826945424", "9.7738826945424", :
# duplicated levels will not be allowed in factors anymore
The reason is that the three numbers remain different in signif(x, 15),
but are mapped to the same string in as.character(x).
length(unique(x)) # [1] 3
length(unique(as.character(x))) # 1
Further examples may be found using
x <- as.character(9 + runif(5000))
x <- as.numeric(x[nchar(x)==15]) # select numbers with 14 digits
x <- signif(cbind(x - 1e-14, x, x + 1e-14), 15)
y <- array(as.character(x), dim=dim(x))
x <- x[which(y[,1] == y[,3]),]
factor(x[1,])
Petr.
More information about the R-devel
mailing list