[Rd] What to do with a inconsistency in rank() that's in S+ and R ever since?

Andrew Piskorski atp at piskorski.com
Fri Oct 27 17:38:32 CEST 2006


On Fri, Oct 27, 2006 at 11:14:25AM +0200, Jens Oehlschl?gel wrote:

> rather, but in fact NAs seem to be always treated ties.method =
> "first". I have no idea in which situation one could desire
> e.g. ties.method = "average" except for NAs!?

Interesting.  I was aware of the S-Plus vs. R difference, but I didn't
realize that it appears to be because R rank() ignores
ties.method="average" for NA values.

> I am aware that the prototype behaves like this and R ever since
> behaves like this, however to me this appears very unfortunate. In
> order not to 'break' existing code, what about adding ties.methods

If you only care about ranking integers and floating point numbers,
it's pretty straghtforward to take the S-Plus implementation of
rank(), call it to my.rank(), and use it in both R and S-Plus.  (Since
the R rank() makes calls to .Internal(), you can't re-use its
implementation in S-Plus.)

Note though that the S-Plus-style my.rank() will still sort strings
differently in R than in S-Plus.  I never looked into why.

Some old notes I have on this issue:

  R and S-Plus rank() treat NAs differently (which can magnifiy other
  floating point differences):

  # S-Plus 6.2.1:            # R 2.1.0:
  > rank(1:5)                > rank(1:5)
  [1] 1 2 3 4 5              [1] 1 2 3 4 5
  > rank(c(1,2,NA,4,NA))     > rank(c(1,2,NA,4,NA))
  [1] 1.0 2.0 4.5 3.0 4.5    [1] 1 2 4 3 5
  > rank(c(1,NA,3,4,NA))     > rank(c(1,NA,3,4,NA))
  [1] 1.0 4.5 2.0 3.0 4.5    [1] 1 4 2 3 5
  > rank(c(1,NA,3))          > rank(c(1,NA,3))
  [1] 1 3 2                  [1] 1 3 2
  > rank(c(NA,NA,3))         > rank(c(NA,NA,3))
  [1] 2.5 2.5 1.0            [1] 2 3 1

-- 
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com/




More information about the R-devel mailing list