[Rd] Native implementation of rowMedians()

Martin Maechler maechler at stat.math.ethz.ch
Mon May 14 14:31:14 CEST 2007


>>>>> "BDR" == Prof Brian Ripley <ripley at stats.ox.ac.uk>
>>>>>     on Mon, 14 May 2007 11:39:18 +0100 (BST) writes:

    BDR> On Mon, 14 May 2007, Henrik Bengtsson wrote:
    >> On 5/14/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
    >>> 
    >>> > Hi Henrik,
    >>> >>>>>> "HenrikB" == Henrik Bengtsson <hb at stat.berkeley.edu>
    >>> >>>>>>     on Sun, 13 May 2007 21:14:24 -0700 writes:
    >>> >
    >>> >    HenrikB> Hi,
    >>> >    HenrikB> I've got a version of rowMedians(x, na.rm=FALSE) for 
    >>> matrices that
    >>> >    HenrikB> handles missing values implemented in C.  It has been

    BDR> [...]

    >>> Also, the 'a version of rowMedians' made me wonder what other version
    >>> there was, and it seems there is one in Biobase which looks a more
    >>> natural home.
    >> 
    >> The rowMedians() in Biobase utilizes rowQ() in ditto.  I actually
    >> started of by adding support for missing values to rowQ() resulting in
    >> the method rowQuantiles(), for which there are also internal functions
    >> for both integer and double matrices.  rowQuantiles() is in R.native
    >> too, but since it has much less CPU milage I wanted to wait with that.
    >> The rowMedians() is developed from my rowQuantiles() optimized for
    >> the 50% quantile.
    >> 
    >> Why do you think it is more natural to host rowMedians() in Biobase
    >> than in one of the core R packages?  Biobase comes with a lot of
    >> overhead for people not in the Bio-world.

    BDR> Because that is where there seems to be a need for it, and having multiple 
    BDR> functions of the same name in different packages is not ideal (and even 
    BDR> with namespaces can cause confusion).

That's correct, of course.
However, I still think that quantiles (and statistics derived
from them) in general and medians in particular are under-used
by many user groups. For some useRs, speed can be an important
reason and for that I had made a big effort to provide runmed()
in R, and I think it would be worthwhile to provide fast rowwise
medians and quantiles, here as well.

Also, BTW, I think it will be worthwhile to provide (R<->C) API
versions of median() and quantile() {with less options than the
R functions, most probably!!}, 
such that we'd hopefully see less re-invention of the wheel
happening in every package that needs such quantiles in its C code.

Biobase is in quite active maintenance, and I'd assume its
maintainers will remove rowMedians() from there (or first
replace it with a wrapper in order to deal with the namespace
issue you mentioned) as soon as R has its own function
with the same (or better) functionality.  
In order to facilitate the transition, we'd have to make sure
that such a 'stats' function does behave " >= " to the bioBase
one. 

Martin



More information about the R-devel mailing list