as.numeric(<factor>) [Difference R/S]
Martin Maechler
Martin Maechler <maechler@stat.math.ethz.ch>
Tue, 20 Jan 1998 09:37:53 +0100
>From R-core; this should interest most R-devel'ers (to some extent):
Since 0.60, the semantics of as.numeric(<factor>) has changed,
e.g.
R> as.integer(factor(c("A","BB")))
[1] NA NA
R> as.integer(factor(c(100,40,100)))
[1] 100 40 100
whereas older R and S:
S> as.integer(factor(c("A","BB")))
[1] 1 2
S> as.integer(factor(c(100,40,100)))
[1] 2 1 2
-------------------------------------
as explained by Ross, below :
>>>>> "KH" == Kurt Hornik <Kurt.Hornik@ci.tuwien.ac.at> writes:
>>>>> Ross Ihaka writes:
KH>> From hornik@ci.tuwien.ac.at Mon Jan 19 22:52 NZD 1998 Subject:
KH>> Difference R/S
KH>>
KH>> Andreas just pointed me to the following:
KH>>
KH>> v <- as.factor(c("Age","Number","Age")) as.numeric(v)
KH>>
KH>> gives
KH>>
KH>> [1] 1 2 1
KH>>
KH>> in S+ and
KH>>
KH>> [1] NA NA NA
KH>>
KH>> Bug/feature/intentional?
KH>>
KH>> Of course, R makes more sense because as.numeric("Age") gives NA in
KH>> both R and S+ ...
KH>>
KH>> Or, should we have as.numeric() return the codes on a non-numeric
KH>> factor?
Ross> At present R (implicitly) computes as.numeric(x) for x a factor as
Ross> as.numeric(as.character(x))
Ross> and S computes
Ross> codes(x)
Ross> I mistakenly thought that S does what I have implemented for R.
Ross> Thomas first objected to the difference and then said he quite liked
Ross> it.
Ross> I quite like the present semantics, but it is easy to change if
Ross> others have different preferences.
KH> I personally think that the current R approach makes more sense,
KH> too. If we all agree on it, I would like to add the difference to
KH> the FAQ, so that it is (well) documented.
Hmm, I first had advocated your view above, myself.
Later, I started to discover in how much S-code
as.numeric(ff)
is just used to extract the factor codes (in {1:M}) from a factor.
This lead me (and Peter Dalgaard, I think) to the conclusion that
- yes, the present R behavior maybe ``cleaner'' than S's
- no, it is a pain to keep it, because it breaks S code too often.
However, as you see, we haven't agreed yet on the topic.
I think we should agree ASAP, since it involves code in several places
(outside R base).
Martin
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._