as.numeric(<factor>) [Difference R/S]
Kurt Hornik
Kurt.Hornik@ci.tuwien.ac.at
Wed, 21 Jan 1998 09:59:27 +0100
>>>>> Peter Dalgaard BSA writes:
> Martin Maechler <maechler@stat.math.ethz.ch> writes:
>>
>> From R-core; this should interest most R-devel'ers (to some extent):
>>
>> Since 0.60, the semantics of as.numeric(<factor>) has changed,
>> e.g.
>>
R> as.integer(factor(c("A","BB")))
>> [1] NA NA
R> as.integer(factor(c(100,40,100)))
>> [1] 100 40 100
>>
>> whereas older R and S:
>>
S> as.integer(factor(c("A","BB")))
>> [1] 1 2
S> as.integer(factor(c(100,40,100)))
>> [1] 2 1 2
>>
> ...
>>
>> Hmm, I first had advocated your view above, myself.
>>
>> Later, I started to discover in how much S-code
>> as.numeric(ff)
>> is just used to extract the factor codes (in {1:M}) from a factor.
>>
>> This lead me (and Peter Dalgaard, I think) to the conclusion that
>> - yes, the present R behavior maybe ``cleaner'' than S's
>> - no, it is a pain to keep it, because it breaks S code too often.
>>
>> However, as you see, we haven't agreed yet on the topic.
>> I think we should agree ASAP, since it involves code in several places
>> (outside R base).
> Actually, I'm even stronger in favour of the S semantics. In addition
> to the above
> - you can always get current behaviour with
> as.numeric(as.character(f)) or as.numeric(levels(f))[f]
> - one should avoid generating NA's unless absolutely necessary
Right :-)
> - when a factor is used for subscripting, you mean the codes,
> not the levels. Currently, we have
>> (1:5)[factor(1:5,labels=5:1)]
> [1] 1 2 3 4 5
> but
>> as.numeric(factor(1:5,labels=5:1))
> [1] 5 4 3 2 1
> I.e. *sometimes* when a factor is coerced to numeric you get
> something different. (And if you change the index semantics,
> code for trend tests and the like is likely to break!).
But that is really a matter of how subscripting treats factors, and not
necessarily what coercion does.
As much as I am in favor of compatibility (remember I do a lot of
porting):
* Suppose f is a factor with numeric levels other than 1 to n. Then
as.numeric(f) returning the codes rather than the levels is strange.
* You also cannot coerce a character vector to numeric without getting
NA's.
Btw:
x <- factor(c(10, 5, 6, 7))
Then levels(x) gives the CHARACTER vector c("5", "6", "7", "10") [in
both R and S+], why that?
And:
R> codes(x)
[1] 4 1 2 3
S> codes(x)
[1] 1 2 3 4
???
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._