[Rd] ecdf with lots of ties is inefficient (PR#7292)

p.dalgaard at biostat.ku.dk p.dalgaard at biostat.ku.dk
Sun Oct 17 11:27:24 CEST 2004


Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:

>     vals <- sort(unique(x))
>     y <- tabulate(match(x, vals))
>     rval <- approxfun(vals, cumsum(y)/n, method = "constant", yleft = 0,
>                       yright = 1, f = 0, ties = "ordered")
> 
> should work better for you and may be little slower if there are no ties, 
> but will use more memory.

...and if all you need is the plot, continue Brian's code with

  Fv <- c(0,cumsum(y))/sum(y)
  xx <- c(vals[1],vals)
  plot(xx, Fv, type="s")

which might well be close enough for your purposes. Or, of course,

  Fs <- stepfun(vals,c(0,cumsum(y)/sum(y)))
  plot(Fs,verticals=FALSE)

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907



More information about the R-devel mailing list