[Rd] cwilcox - new version

Andrew Robbins @ndrew @end|ng |rom robb|n@@@me
Wed Jan 17 16:02:08 CET 2024


Hi All,

Figured I'd put my two cents in here as the welch-lab's LIGER package 
currently uses mann-whitney on datasets much larger than m = 200. Our 
current version uses a modified PRESTO 
(https://github.com/immunogenomics/presto) implementation over the 
inbuilt tests because of the lack of scaling. I stumbled into this 
thread while working on some improvements for it and would like to make 
it known that there is absolutely an audience for the high-member use-case.

Best,

-Andrew Robbins

On 1/17/2024 5:55 AM, Andreas Löffler wrote:
>>
>> Performance statistics are interesting. If we assume the two populations
>> have a total of `m` members, then this implementation runs slightly slower
>> for m < 20, and much slower for 50 < m < 100. However, this implementation
>> works significantly *faster* for m > 200. The breakpoint is precisely when
>> each population has a size of 50; `qwilcox(0.5,50,50)` runs in 8
>> microseconds in the current version, but `qwilcox(0.5, 50, 51)` runs in 5
>> milliseconds. The new version runs in roughly 1 millisecond for both. This
>> is probably because of internal logic that requires many more `free/calloc`
>> calls if either population is larger than `WILCOX_MAX`, which is set to 50.
>>
> Also because cwilcox_sigma has to be evaluated, and this is slightly more
> demanding since it uses k%d.
>
> There is a tradeoff here between memory usage and time of execution. I am
> not a heavy user of the U test but I think the typical use case does not
> involve several hundreds of tests in a session so execution time (my 2
> cents) is less important. But if R crashes one execution is already
> problematic.
>
> But the takeaway is  probably: we should implement both approaches in the
> code and leave it to the user which one she prefers. If time is important
> and memory not an issue and if m, n are low go for the "traditional
> approach". Otherwise, use my formula?
>
> PS (@Aidan): I have applied for an bugzilla account two days ago and heard
> not back from them. Also Spam is empty. Is that ok or shall I do something?
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Andrew Robbins
Systems Analyst, Welch Lab<https://welch-lab.github.io>
University of Michigan
Department of Computational Medicine and Bioinformatics


-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20240117/c18b12c1/attachment.sig>


More information about the R-devel mailing list