[Rd] suggestion for extending ?as.factor
Martin Maechler
maechler at stat.math.ethz.ch
Fri May 8 18:48:40 CEST 2009
>>>>> "PS" == Petr Savicky <savicky at cs.cas.cz>
>>>>> on Fri, 8 May 2009 18:10:56 +0200 writes:
PS> On Fri, May 08, 2009 at 05:14:48PM +0200, Petr Savicky wrote:
>> Let me suggest to consider the following modification, where match() is done
>> on the strings, not on the original values.
>> levels <- unique(as.character(sort(unique(x))))
>> x <- as.character(x)
>> f <- match(x, levels)
PS> An alternative solution is
> ind <- order(x)
> x <- as.character(x) # or any other conversion to character
> levels <- unique(x[ind]) # get unique levels ordered by the original values
> f <- match(x, levels)
Yes, that's an interesting quite different and simple approach.
PS> The advantage of this over the suggestion from my previous email is that
PS> the string conversion is applied only once. The conversion need not be only
PS> as.character(). There may be other choices specified by a parametr. I have
PS> strong objections against the existing implementation of as.character(),
{(because it is not *accurate* enough, right ?)}
PS> but still i think that as.character() should be the default for factor()
PS> for the sake of consistency of the R language.
Hmm... Peter Dalgaard very early in this thread
remarked that at least in the use of table(..),
factor() should not be extremely accurate, and that's what
R-devel's factor has been doing recently.
But then, table(.) could be changed to explicitly call
factor(signif(x, 15), ...)
for the case of numeric x.
BTW: I found that practically all the remaining border cases you
had, are "solved" by using 14 instead of 15.
I'm currently testing a version of factor() that uses 14,
*and* adds an extra final level test, removing duplicated ones.
Martin
More information about the R-devel
mailing list