[Rd] ecdf with lots of ties is inefficient (PR#7292)

martin at gsc.riken.jp martin at gsc.riken.jp
Sun Oct 17 08:50:23 CEST 2004


Full_Name: Martin Frith
Version: R-2.0.0
OS: linux-gnu
Submission from: (NULL) (134.160.83.73)


I have large vectors containing 100,000 to 20,000,000 numbers. However, they
only contain a few hundred *distinct* numbers (e.g. positive integers < 200).
When I do ecdf(v), it either runs out of memory, or it succeeds, but when I plot
the ecdf with postscript, the output is unnecessarily bloated because the same
lines get redrawn many times. The complexity of ecdf should depend on how many
distinct numbers there are, not how many total numbers.

This is my first bug report, so forgive me if I've done something stupid!



More information about the R-devel mailing list