[Rd] suggestion for extending ?as.factor
Petr Savicky
savicky at cs.cas.cz
Mon May 11 21:03:52 CEST 2009
On Mon, May 11, 2009 at 05:06:38PM +0200, Martin Maechler wrote:
[...]
> The version I have committed a few hours ago is indeed a much
> re-simplified version, using as.character(.) explicitly
> and consequently no longer providing the extra optional
> arguments that we have had for a couple of days.
>
> Keeping such a basic function factor() as simple as possible
> seems a good strategy to me.
OK. I understand the argument of simplicity. So, factor(x) is just
a compressed encoding of as.character(x), where each value is stored
only once. This sounds good to me.
Let me go back to the original purpose of this thread: suggestion for
extending ?as.factor
I think that somewhere in the help page, we could have something like
Using factor() to a numeric vector should be done with caution. The
information in x is preserved to the extent to which it is preserved
in as.character(x). If this leads to too many different levels due to minor
differences among the input numbers, it is suggested to use something like
factor(signif(x, digits)) or factor(round(x, digits)), where the number of
decimal digits appropriate for a given application should be used.
Let me point out that the following sentence from Warning is not exactly correct
as it is in svn at the moment. So, i suggest to add the word "approximately" to
the place marked with square brackets and add one more sentence of explanation
marked also by square brackets.
To transform a factor \code{f} to [approximately]
its original numeric values, \code{as.numeric(levels(f))[f]} is
recommended and slightly more efficient than
\code{as.numeric(as.character(f))}.
[Note that the original values may be extracted only to the precision
used in as.character(x), which is typically 15 decimal digits.]
Petr.
More information about the R-devel
mailing list