[Rd] Native implementation of rowMedians()
Martin Maechler
maechler at stat.math.ethz.ch
Mon May 14 14:31:14 CEST 2007
>>>>> "BDR" == Prof Brian Ripley <ripley at stats.ox.ac.uk>
>>>>> on Mon, 14 May 2007 11:39:18 +0100 (BST) writes:
BDR> On Mon, 14 May 2007, Henrik Bengtsson wrote:
>> On 5/14/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
>>>
>>> > Hi Henrik,
>>> >>>>>> "HenrikB" == Henrik Bengtsson <hb at stat.berkeley.edu>
>>> >>>>>> on Sun, 13 May 2007 21:14:24 -0700 writes:
>>> >
>>> > HenrikB> Hi,
>>> > HenrikB> I've got a version of rowMedians(x, na.rm=FALSE) for
>>> matrices that
>>> > HenrikB> handles missing values implemented in C. It has been
BDR> [...]
>>> Also, the 'a version of rowMedians' made me wonder what other version
>>> there was, and it seems there is one in Biobase which looks a more
>>> natural home.
>>
>> The rowMedians() in Biobase utilizes rowQ() in ditto. I actually
>> started of by adding support for missing values to rowQ() resulting in
>> the method rowQuantiles(), for which there are also internal functions
>> for both integer and double matrices. rowQuantiles() is in R.native
>> too, but since it has much less CPU milage I wanted to wait with that.
>> The rowMedians() is developed from my rowQuantiles() optimized for
>> the 50% quantile.
>>
>> Why do you think it is more natural to host rowMedians() in Biobase
>> than in one of the core R packages? Biobase comes with a lot of
>> overhead for people not in the Bio-world.
BDR> Because that is where there seems to be a need for it, and having multiple
BDR> functions of the same name in different packages is not ideal (and even
BDR> with namespaces can cause confusion).
That's correct, of course.
However, I still think that quantiles (and statistics derived
from them) in general and medians in particular are under-used
by many user groups. For some useRs, speed can be an important
reason and for that I had made a big effort to provide runmed()
in R, and I think it would be worthwhile to provide fast rowwise
medians and quantiles, here as well.
Also, BTW, I think it will be worthwhile to provide (R<->C) API
versions of median() and quantile() {with less options than the
R functions, most probably!!},
such that we'd hopefully see less re-invention of the wheel
happening in every package that needs such quantiles in its C code.
Biobase is in quite active maintenance, and I'd assume its
maintainers will remove rowMedians() from there (or first
replace it with a wrapper in order to deal with the namespace
issue you mentioned) as soon as R has its own function
with the same (or better) functionality.
In order to facilitate the transition, we'd have to make sure
that such a 'stats' function does behave " >= " to the bioBase
one.
Martin
More information about the R-devel
mailing list