[Rd] ecdf with lots of ties is inefficient (PR#7292)
p.dalgaard at biostat.ku.dk
p.dalgaard at biostat.ku.dk
Sun Oct 17 11:27:24 CEST 2004
Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:
> vals <- sort(unique(x))
> y <- tabulate(match(x, vals))
> rval <- approxfun(vals, cumsum(y)/n, method = "constant", yleft = 0,
> yright = 1, f = 0, ties = "ordered")
>
> should work better for you and may be little slower if there are no ties,
> but will use more memory.
...and if all you need is the plot, continue Brian's code with
Fv <- c(0,cumsum(y))/sum(y)
xx <- c(vals[1],vals)
plot(xx, Fv, type="s")
which might well be close enough for your purposes. Or, of course,
Fs <- stepfun(vals,c(0,cumsum(y)/sum(y)))
plot(Fs,verticals=FALSE)
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-devel
mailing list