[R] a problem of approach
Adrian Duşa
dusa.adrian at gmail.com
Wed Jun 27 16:36:08 CEST 2012
Dear R-help list,
Part of a program I wrote seem to take a significant amount of time,
therefore I am looking for an alternative approach.
In order to explain what is does:
- the input is a sorted vector of integer numbers
- some higher numbers may be derived (using a mathematical formula)
from lower numbers, therefore they should be eliminated
- at the end, the vector should contain only uniquely defined numbers
Pet hypothetical example, input vector:
- 2 3 4 5 6 7 8 9 10
- number 2 generates 4, 7, 10
- 2 3 5 6 8 9 (surviving vector)
- number 3 generates 5 and 9
- 2 3 6 8 (surviving vector)
- number 6 generates 8
- final surviving vector 2 3 6
Function foo(x, ...) generates the numbers, my current approach being:
####
index <- 0
while ((index <- index + 1) < length(numbers)) {
numbers <- setdiff(numbers, foo(numbers[index]))
}
####
This seem to take quite some time (but I don't know any other way of
doing it), hence my question(s):
- would there be another (quicker) implementation in R?
- alternatively, should I go for a C implementation?
(actually, I did create a C implementation, but it doesn't bring any
more speed... it is actually a bit slower).
A real-life pet example, using the function findSubsets() from the QCA
package (our foo function above):
####
library(QCA)
testfoo <- function(x, y) {
index <- 0
while((index <- index + 1) < length(x)) {
x <- setdiff(x, findSubsets(y, x[index], max(x)))
}
return(x)
}
nofl <- rep(3, 14)
set.seed(12345)
numbers <- sort(sample(seq(prod(nofl)), 1000000))
system.time(result <- testfoo(numbers, nofl))
####
user system elapsed
8.168 2.049 10.148
Any hint will be highly appreciated, thanks in advance,
Adrian
--
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391
More information about the R-help
mailing list