[Rd] rank(*) with NAs -- new option "keep" desired

Thu Sep 11 17:57:22 MEST 2003

Thank you, Gabor!

>>>>> "Gabor" == Gabor Grothendieck <ggrothendieck at volcanomail.com>
>>>>>     on Thu, 11 Sep 2003 07:09:42 -0700 (PDT) writes:

    Gabor> I could have used this also.  Currently I do this:
    Gabor> z <- ifelse(is.na(y),NA,rank(y))
    Gabor> names(z) <- names(y)

    Gabor> The following also works but is less transparent:

    Gabor> z <- y
    Gabor> z[z==z] <- rank(y)

the next version of R should definitely have a builtin version,
as a matter of fact, today's (or tomorrow's) R-alpha version
already has.  
My point was rather enquiring on how the "API" should be setup
(argument name, semantic, ..).

    Gabor> Another extension of rank that could be useful would be to have
    Gabor> an argument that causes it NOT to do tie averaging.  
    Gabor> This is useful when you are using rank(x) in the
    Gabor> sense of an inverse permutation.  Currently I do
    Gabor> this:

    Gabor> z <- order(order(y))
    Gabor> names(z) <- names(y)

Good point!  However,  help(rank) has been mentioning this and recommends 
sort.list() instead of order().
	    (and does forget about the names() !)

[ For efficiency reasons, sort.list() is preferable to order for
 the simple order()ing on only one argument. ]

    Gabor> --- Martin Maechler <maechler at stat.math.ethz.ch> wrote:
    >> In some contexts, I find the current behavior of rank() very
    >> `suboptimal'.
    >> 
    >> We have the argument na.last = {TRUE | FALSE | NA }
    >> where the first two cases treating NAs (almost) as if they were
    >> == +Inf or == -Inf  whereas the 3rd case just drops NAs.
    >> For the typical ``Rank Transformation'' that is recommended in
    >> EDA in several contexts, I would however want something else,
    >> namely keep the NAs !
    >> 
    >> An example -- including the new option as I'm proposing it ---
    >> makes things more clear :
    >> 
    >>> y <- c(2:1,NA,0)
    >>> rank(y, na.last = TRUE)## ==== rank(y)
    >> [1] 3 2 4 1
    >>> rank(y, na.last = FALSE)
    >> [1] 4 3 1 2
    >>> rank(y, na.last = NA)
    >> [1] 3 2 1
    >>> rank(y, na.last = "keep") ### <<<<< NEW >>
    >> [1]  3  2 NA  1
    >>> 
    >> ---
    >> 
    >> Alternatively to extending the possible values of `na.last' I
    >> first thought of a new (boolean) argument, but found the current
    >> solution less ugly.
    >> 
    >> Feedback welcome!
    >> 
    >> Martin Maechler <maechler at stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
    >> Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
    >> ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
    >> phone: x-41-1-632-3408		fax: ...-1228			<><
    >> 
    >> PS: Stumbled over this while implementing  cor.test()s
    >> method = c("pearson", "spearman", "kendall")  for  cor()
    >> itself.