[R] a problem of approach
Adrian Duşa
dusa.adrian at gmail.com
Wed Jun 27 20:23:53 CEST 2012
On Wed, Jun 27, 2012 at 8:11 PM, jim holtman <jholtman at gmail.com> wrote:
> If you look, half of the time is spent in the 'findSubsets" function
> and the other half in determining where the differences are in the
> sets. Is there a faster way of doing what findSubsets does since it
> is the biggest time consumer. The setdiff might be speeded up by
> using 'match'.
>
That's right, Jim!
I have a C implementation of the findSubsets() ready, and put it to the test.
testfoo2 <- function(x, y) {
mbase <- c(rev(cumprod(rev(y))), 1)[-1]
index <- 0
while((index <- index + 1) < length(x)) {
x <- setdiff(x, .Call("fS", x, y, mbase, max(x)))
}
return(x)
}
> system.time(result2 <- testfoo2(numbers, nofl))
user system elapsed
4.691 1.487 6.091
A decrease with about 40% (from the initial 10.148) ... that's very nice indeed.
match() however, didn't dramatically decrease the time:
testfoo3 <- function(x, y) {
mbase <- c(rev(cumprod(rev(y))), 1)[-1]
index <- 0
while((index <- index + 1) < length(x)) {
x <- x[is.na(match(x, .Call("fS", x, y, mbase, max(x))))]
}
return(x)
}
> system.time(result3 <- testfoo3(numbers, nofl))
user system elapsed
4.304 1.359 5.621
However, your suggestions reduced the total time to almost a half,
which is fantastic.
The last question is related to the while() loop. All my R knowledge
tells me that loops are bound to be slow in R, therefore I wonder if
the while() loop can be avoided somehow, in this example.
Anyways, thanks a lot!
Adrian
--
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391
More information about the R-help
mailing list