[R] a problem of approach

Adrian Duşa dusa.adrian at gmail.com
Wed Jun 27 16:36:08 CEST 2012


Dear R-help list,

Part of a program I wrote seem to take a significant amount of time,
therefore I am looking for an alternative approach.
In order to explain what is does:

- the input is a sorted vector of integer numbers
- some higher numbers may be derived (using a mathematical formula)
from lower numbers, therefore they should be eliminated
- at the end, the vector should contain only uniquely defined numbers

Pet hypothetical example, input vector:
- 2 3 4 5 6 7 8 9 10
- number 2 generates 4, 7, 10
- 2 3 5 6 8 9 (surviving vector)
- number 3 generates 5 and 9
- 2 3 6 8 (surviving vector)
- number 6 generates 8
- final surviving vector 2 3 6

Function foo(x, ...) generates the numbers, my current approach being:
####
index <- 0
while ((index <- index + 1) < length(numbers)) {
    numbers <- setdiff(numbers, foo(numbers[index]))
}
####

This seem to take quite some time (but I don't know any other way of
doing it), hence my question(s):
- would there be another (quicker) implementation in R?
- alternatively, should I go for a C implementation?

(actually, I did create a C implementation, but it doesn't bring any
more speed... it is actually a bit slower).

A real-life pet example, using the function findSubsets() from the QCA
package (our foo function above):

####
library(QCA)
testfoo <- function(x, y) {
    index <- 0
    while((index <- index + 1) < length(x)) {
        x <- setdiff(x, findSubsets(y, x[index], max(x)))
    }
    return(x)
}

nofl <- rep(3, 14)
set.seed(12345)
numbers <- sort(sample(seq(prod(nofl)), 1000000))

system.time(result <- testfoo(numbers, nofl))
####
   user  system elapsed
  8.168   2.049  10.148

Any hint will be highly appreciated, thanks in advance,
Adrian

-- 
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
       +40 21 3120210 / int.101
Fax: +40 21 3158391



More information about the R-help mailing list