[Rd] suggestion for extending ?as.factor
Martin Maechler
maechler at stat.math.ethz.ch
Sat May 9 22:55:17 CEST 2009
>>>>> "PS" == Petr Savicky <savicky at cs.cas.cz>
>>>>> on Fri, 8 May 2009 18:10:56 +0200 writes:
PS> On Fri, May 08, 2009 at 05:14:48PM +0200, Petr Savicky wrote:
>> Let me suggest to consider the following modification, where match() is done
>> on the strings, not on the original values.
>> levels <- unique(as.character(sort(unique(x))))
>> x <- as.character(x)
>> f <- match(x, levels)
PS> An alternative solution is
PS> ind <- order(x)
PS> x <- as.character(x) # or any other conversion to character
PS> levels <- unique(x[ind]) # get unique levels ordered by the original values
PS> f <- match(x, levels)
(slightly but not much more complicated though).
Yes, indeed that brings us back to (something like) the original
"use factor(format(x)) ..." suggestion which would have been
fine if there hadn't been the issue of ordering,
exactly what you've addressed before.
PS> The advantage of this over the suggestion from my previous email is that
PS> the string conversion is applied only once. The conversion need not be only
PS> as.character(). There may be other choices specified by a parametr. I have
PS> strong objections against the existing implementation of as.character(),
PS> but still i think that as.character() should be the default for factor()
PS> for the sake of consistency of the R language.
The biggest advantage to reverting to something simple like
that, would be that it is really simple.
My first tests with (a variation of) the above indicate
favorable results. More on this on Monday.
If'd revert to such a solution,
we'd have to get back to Peter's point about the issue that
he'd think table(.) should be more tolerant than as.character()
about "almost equality".
For compatibility reasons, we could also return back to the
reasoning that useR should use {something like}
table(signif(x, 14))
instead of
table(x)
for numeric x in "typical" cases.
Martin
More information about the R-devel
mailing list