[Rd] suggestion for extending ?as.factor

Petr Savicky savicky at cs.cas.cz
Mon May 11 21:03:52 CEST 2009


On Mon, May 11, 2009 at 05:06:38PM +0200, Martin Maechler wrote:
[...]
> The version I have committed a few hours ago is indeed a much
> re-simplified version, using  as.character(.) explicitly
> and consequently no longer providing the extra optional
> arguments that we have had for a couple of days.
> 
> Keeping such a basic function   factor()  as simple as possible 
> seems a good strategy to me.

OK. I understand the argument of simplicity. So, factor(x) is just
a compressed encoding of as.character(x), where each value is stored
only once. This sounds good to me.

Let me go back to the original purpose of this thread: suggestion for
extending ?as.factor

I think that somewhere in the help page, we could have something like

  Using factor() to a numeric vector should be done with caution. The
  information in x is preserved to the extent to which it is preserved
  in as.character(x). If this leads to too many different levels due to minor
  differences among the input numbers, it is suggested to use something like 
  factor(signif(x, digits)) or factor(round(x, digits)), where the number of 
  decimal digits appropriate for a given application should be used.

Let me point out that the following sentence from Warning is not exactly correct
as it is in svn at the moment. So, i suggest to add the word "approximately" to
the place marked with square brackets and add one more sentence of explanation
marked also by square brackets.

  To transform a factor \code{f} to [approximately]
  its original numeric values, \code{as.numeric(levels(f))[f]} is
  recommended and slightly more efficient than
  \code{as.numeric(as.character(f))}.
  [Note that the original values may be extracted only to the precision
  used in as.character(x), which is typically 15 decimal digits.]

Petr.



More information about the R-devel mailing list