[Rd] rank(*) with NAs -- new option "keep" desired
Martin Maechler
maechler at stat.math.ethz.ch
Thu Sep 11 17:57:22 MEST 2003
Thank you, Gabor!
>>>>> "Gabor" == Gabor Grothendieck <ggrothendieck at volcanomail.com>
>>>>> on Thu, 11 Sep 2003 07:09:42 -0700 (PDT) writes:
Gabor> I could have used this also. Currently I do this:
Gabor> z <- ifelse(is.na(y),NA,rank(y))
Gabor> names(z) <- names(y)
Gabor> The following also works but is less transparent:
Gabor> z <- y
Gabor> z[z==z] <- rank(y)
the next version of R should definitely have a builtin version,
as a matter of fact, today's (or tomorrow's) R-alpha version
already has.
My point was rather enquiring on how the "API" should be setup
(argument name, semantic, ..).
Gabor> Another extension of rank that could be useful would be to have
Gabor> an argument that causes it NOT to do tie averaging.
Gabor> This is useful when you are using rank(x) in the
Gabor> sense of an inverse permutation. Currently I do
Gabor> this:
Gabor> z <- order(order(y))
Gabor> names(z) <- names(y)
Good point! However, help(rank) has been mentioning this and recommends
sort.list() instead of order().
(and does forget about the names() !)
[ For efficiency reasons, sort.list() is preferable to order for
the simple order()ing on only one argument. ]
Gabor> --- Martin Maechler <maechler at stat.math.ethz.ch> wrote:
>> In some contexts, I find the current behavior of rank() very
>> `suboptimal'.
>>
>> We have the argument na.last = {TRUE | FALSE | NA }
>> where the first two cases treating NAs (almost) as if they were
>> == +Inf or == -Inf whereas the 3rd case just drops NAs.
>> For the typical ``Rank Transformation'' that is recommended in
>> EDA in several contexts, I would however want something else,
>> namely keep the NAs !
>>
>> An example -- including the new option as I'm proposing it ---
>> makes things more clear :
>>
>>> y <- c(2:1,NA,0)
>>> rank(y, na.last = TRUE)## ==== rank(y)
>> [1] 3 2 4 1
>>> rank(y, na.last = FALSE)
>> [1] 4 3 1 2
>>> rank(y, na.last = NA)
>> [1] 3 2 1
>>> rank(y, na.last = "keep") ### <<<<< NEW >>
>> [1] 3 2 NA 1
>>>
>> ---
>>
>> Alternatively to extending the possible values of `na.last' I
>> first thought of a new (boolean) argument, but found the current
>> solution less ugly.
>>
>> Feedback welcome!
>>
>> Martin Maechler <maechler at stat.math.ethz.ch> http://stat.ethz.ch/~maechler/
>> Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27
>> ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
>> phone: x-41-1-632-3408 fax: ...-1228 <><
>>
>> PS: Stumbled over this while implementing cor.test()s
>> method = c("pearson", "spearman", "kendall") for cor()
>> itself.
More information about the R-devel
mailing list