[Rd] ecdf with lots of ties is inefficient (PR#7292)
Prof Brian Ripley
ripley at stats.ox.ac.uk
Sun Oct 17 09:30:38 CEST 2004
This is easy: x <- sort(x) should be first (as that drops NAs). Fixed in
On Sun, 17 Oct 2004, stefano iacus wrote:
> I would add that some action has to be taken in presence of missing
> values, i.e.
> > x <- c(1,2,2,4,7, NA, 10,12, 15,20)
> > ecdf(x)
> Error in xy.coords(x, y) : x and y lengths differ
> On Oct 17, 2004, at 8:50 AM, martin at gsc.riken.jp wrote:
> > Full_Name: Martin Frith
> > Version: R-2.0.0
> > OS: linux-gnu
> > Submission from: (NULL) (18.104.22.168)
> > I have large vectors containing 100,000 to 20,000,000 numbers.
> > However, they
> > only contain a few hundred *distinct* numbers (e.g. positive integers
> > < 200).
> > When I do ecdf(v), it either runs out of memory, or it succeeds, but
> > when I plot
> > the ecdf with postscript, the output is unnecessarily bloated because
> > the same
> > lines get redrawn many times. The complexity of ecdf should depend on
> > how many
> > distinct numbers there are, not how many total numbers.
> > This is my first bug report, so forgive me if I've done something
> > stupid!
> > ______________________________________________
> > R-devel at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> R-devel at stat.math.ethz.ch mailing list
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel