[R] help: program efficiency
William Dunlap
wdunlap at tibco.com
Thu Nov 25 18:31:09 CET 2010
If the input vector t is known to be ordered
(or if you only care about runs of duplicated
values, not all duplicated values) the following
is pretty quick
nodup3 <- function (t) {
t + (sequence(rle(t)$lengths) - 1)/100
}
If you don't know if the the input will be ordered
then ave() will do it a bit faster than your
code
nodup2 <- function (t) {
ave(t, t, FUN = function(x) x + (seq_along(x) - 1)/100)
}
E.g., for a sorted sequence of 300,000 numbers drawn with
replacement from 1:100,000 I get:
> a2 <- sort(sample(1:1e5, size=3e5, replace=TRUE))
> system.time(v <- nodup(a2))
user system elapsed
2.78 0.05 3.97
> system.time(v2 <- nodup2(a2))
user system elapsed
1.83 0.02 2.66
> system.time(v3 <- nodup3(a2))
user system elapsed
0.18 0.00 0.14
> identical(v,v2) && identical(v,v3)
[1] TRUE
If speed is truly an issue, the built-in sequence may
be replaced by a faster one that does the same thing:
nodup3a <- function (t) {
faster.sequence <- function(nvec) {
seq_len(sum(nvec)) - rep(cumsum(c(0L, nvec[-length(nvec)])),
nvec)
}
t + (faster.sequence(rle(t)$lengths) - 1)/100
}
That took 0.05 seconds on the a2 dataset and produced
identical results.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of randomcz
> Sent: Thursday, November 25, 2010 6:49 AM
> To: r-help at r-project.org
> Subject: [R] help: program efficiency
>
>
> hey guys,
>
> I am working on a function to make a duplicated value unique.
> For example,
> the original vector would be like : a = c(2,1,1,3,3,3,4)
> I'll like to transform it into:
> a.nodup = 2, 1.01, 1.02, 3.01, 3.02, 3.03, 4
> basically, find the duplicates and assign a unique value by
> adding a small
> amount and keep it in order.
> I come up with the following codes, but it runs slow if t is
> large. Is there
> a better way to do it?
> nodup = function(t)
> {
> t.index=0
> t.dup=duplicated(t)
> for (i in 2:length(t))
> {
> if (t.dup[i]==T)
> t.index=t.index+0.01
> else t.index=0
> t[i]=t[i]+t.index
> }
> return(t)
> }
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/help-program-efficiency-tp305907
9p3059079.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list