[R] any other fast method for median calculation
Thomas Lumley
tlumley at u.washington.edu
Tue Apr 14 17:18:13 CEST 2009
On Tue, 14 Apr 2009, S Ellison wrote:
> Sorting with an appropriate algorithm is nlog(n), so it's very hard to
> get the 'exact' median any faster.
There actually are linear-time algorithms for the median, but n has to be very large before they are worth using, and by then you have to start considering locality of reference and other issues.
> In any case, it looks like you are not constrained by the median
> algorithm, but by the number of calls. You might do a lot better with
> apply, though
>> apply(df,2,median)
> On my system 200k columns were processed in negligible time by apply
> and I'm still waiting for mapply.
I'd also note that this is the sort of problem where the profiler is useful: you can see on a smaller subset whether R is spending most of its time in median() or somewhere else.
I wouldn't be surprised if a while() loop was even faster than apply() in this setting, but probably not enough to care about.
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help
mailing list