[R] Solving an optimization problem: selecting an "optimal" subset

Hans W Borchers hwborchers at googlemail.com
Mon Feb 1 17:04:37 CET 2010


Dimitri Shvorob <dimitri.shvorob <at> gmail.com> writes:
>
> Given vector of numbers x, I wish to select an n-subset with sum closest
> fixed value s. Can anyone advise me how to approach this, in R? 
>
> I have considered Rcplex package, which handles integer/binary
> linear/quadratic optimization problems, but have difficulty setting up
> the quadratic form for [sum(x) - s]^2. 
>
> (Dynamic programming over [0, sum(x)]? A genetic algorithm? Can anyone
> contribute a binary GA optimization sample?)
>

Here is a solution completely done with R and the MILP solver Symphony
provided in the 'Rsymphony' package (but 'Rglpk' will work the same). It
finds subsets of size <= k in a set s of size n.

The "absolute deviation" from the target value z is simulated by two
additional continuous (a_p and a_m) and and one binary variable d0.
To force positivity, the common Big-M trick is employed.

Hans Werner

----
    library(Rsymphony)

    n <- 100; k <- 20               # Find k-subset in set of size n
    z <- 2.0                        # whose elements sum up to z 

    set.seed(8232); s <- runif(n)   # reference example

    M <- sum(s) + z                 # for the Big-M trick

    # Prepare inputs for MILP solver
    obj <- c(rep(0, n), 0, 1, 1, 0)
    typ <- c(rep("B", n), "B", "C", "C", "B")
    mat <- matrix(c(        s, -z, -1, 1, 0,    # a = a_p + a_m
                    rep(0, n), 1, 0, 0, 0,      # constant term
                    rep(0, n), 0, 1, 0, -M,     # a_p <= M * d0
                    rep(0, n), 0, 0, 1, M,      # a_m <= M * (d0-1)
                    rep(1, n), 0, 0, 0, 0),     # subset size <= k
        nrow=5, byrow=T)
    dir <- c("==", "==", "<=", "<=", "<=")
    rhs <- c(0, 1, 0, M, k)
    max <- FALSE

    # Apply Sypmphony as our MILP solver
    sol <- Rsymphony_solve_LP(obj, mat, dir, rhs, types=typ, max = max)


    x <- sol$solution[1:n]          # Solution vector describing k-subset
    ( i <- (1:n)[x==1] )            #=> [1] 18 25 44 81 95
    print(sum(s[i]), digits=10)     #=> [1] 2.000000205
----



More information about the R-help mailing list