[R] any other fast method for median calculation
Thomas Lumley
tlumley at u.washington.edu
Tue Apr 14 17:18:13 CEST 2009
On Tue, 14 Apr 2009, S Ellison wrote:
> Sorting with an appropriate algorithm is nlog(n), so it's very hard to
> get the 'exact' median any faster.
There actually are linear-time algorithms for the median, but n has to be very large before they are worth using, and by then you have to start considering locality of reference and other issues.
> In any case, it looks like you are not constrained by the median
> algorithm, but by the number of calls. You might do a lot better with
> apply, though
>> apply(df,2,median)
>
> On my system 200k columns were processed in negligible time by apply
> and I'm still waiting for mapply.
I'd also note that this is the sort of problem where the profiler is useful: you can see on a smaller subset whether R is spending most of its time in median() or somewhere else.
I wouldn't be surprised if a while() loop was even faster than apply() in this setting, but probably not enough to care about.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help
mailing list