[R] A combinatorial optimization problem: finding the best permutation of a complex vector

Sun Nov 15 05:47:07 CET 2009

Hi,

It was pointed out to me by Hans Borchers that the timing that I reported (0.3 sec) in the previous email for solving the LSAP problem, for N=1000, was too optimistic, because "X" and "Y" were equivalent up to a permutation.  

In order to test this out, I ran a few more experiments with different random variate distributions for X and Y.  In all the experiments, I took N = 500.

The execution times were faster when X and Y had the same or similar distributions.  This is generally around 8 - 9 seconds.  The more different the distributions are, the greater the time.   For example, when I took the real and imaginary parts of X to be from a t-distribution with 3 degrees of freedom, and Y to be from uniform distribution in (0, 1), the execution times were around 80-90 seconds. 

n <- 500

x <- rt(n, df=3) + 1i * rt(n, df=3)

y <- runif(n) + 1i * runif(n)

Cmat <- outer(x, y, FUN=function(x,y) Mod(x - y))

system.time(ans <- solve_LSAP(Cmat, maximum=FALSE))

When I increased N = 1000, the time was about 1400 seconds!  

Ravi.

____________________________________________________________________

Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvaradhan at jhmi.edu

----- Original Message -----
From: Ravi Varadhan <rvaradhan at jhmi.edu>
Date: Saturday, November 14, 2009 10:53 am
Subject: Re: [R] A combinatorial optimization problem: finding the best permutation of a complex vector
To: "Charles C. Berry" <cberry at tajo.ucsd.edu>
Cc: r-help at r-project.org

> Hi,
> 
> I have solved the problem that I had posed before.  Here is a 
> statement of the problem:
> 
> "I have a complex-valued vector X in C^n.  Given another 
> complex-valued vector Y in C^n, I want to find a permutation of Y, 
> say, Y*,  that minimizes ||X - Y*||, the distance between X and Y*. "
> 
> I was talking to Professor Moody T. Chu, who is a well-known numerical 
> analyst from NC State Univ, and he pointed out that this problem is an 
> instance of the classical "linear sum assignment problem (LSAP)" in 
> discrete mathematics.  Once this was revealed to me, it didn't take me 
> long to find out the existence of various algorithms (e.g. Hungarian 
> algorithm) and codes (C, Fortran, Matlab) for solving this problem.  I 
> also looked in the CRAN task view on optimization and found that the 
> LSAP solver is present in the "clue" package.  Thanks to Kurt Hornik 
> for this package.  
> 
> So, here is an illustration of the "amazing" power of mathematics:
> 
> n <- 1000
> 
> x <- rt(n, df=3) + 1i * rt(n, df=3)  # this is the target vector to be 
> matched
> 
> y <- x[sample(n)]  # this is the vector to be permuted
> 
> # Note:  I have chosen a random permutation of the target so that I 
> know the answer is "x" itself
> # and the minimum distance is zero
> 
> Cmat <- outer(x, y, FUN=function(x,y) Mod(x - y))
> 
> require(clue)
> 
> ans <- solve_LSAP(Cmat, maximum=FALSE)  # We are minimizing the linear 
> sum
> 
> dist <- function(x, y) sqrt(sum(Mod(x - y)^2))
> 
> dist(x, y[c(ans)])
> 
> 
> This is remarkable.  It takes only about 0.3 seconds to solve this 
> difficult combinatorial problem!
> 
> 
> Best,
> Ravi.
> 
> ____________________________________________________________________
> 
> Ravi Varadhan, Ph.D.
> Assistant Professor,
> Division of Geriatric Medicine and Gerontology
> School of Medicine
> Johns Hopkins University
> 
> Ph. (410) 502-2619
> email: rvaradhan at jhmi.edu
> 
> 
> ----- Original Message -----
> From: "Charles C. Berry" <cberry at tajo.ucsd.edu>
> Date: Thursday, November 12, 2009 2:20 pm
> Subject: Re: [R] A combinatorial optimization problem: finding the 
> best permutation of a complex vector
> To: Ravi Varadhan <rvaradhan at jhmi.edu>
> Cc: r-help at r-project.org
> 
> 
> > On Thu, 12 Nov 2009, Ravi Varadhan wrote:
> > 
> > >
> > > Hi,
> > >
> > > I have a complex-valued vector X in C^n.  Given another 
> > complex-valued 
> > > vector Y in C^n, I want to find a permutation of Y, say, Y*, that 
> 
> > > minimizes ||X - Y*||, the distance between X and Y*.
> > >
> > > Note that this problem can be trivially solved for "Real" vectors, 
> 
> > since 
> > > real numbers possess the ordering property. Complex numbers, 
> > however, do 
> > > not possess this property.  Hence the challenge.
> > >
> > > The obvious solution is to enumerate all the permutations of Y and 
> 
> > pick 
> > > out the one that yields the smallest distance.  This, however, is 
> 
> > only 
> > > feasible for small n.  I would like to be able to solve this for n 
> 
> > as 
> > > large as 100 - 1000, in which cases the permutation approach is 
> > > infeasible.
> > >
> > > I am looking for algorithms, possibly iterative, that can provide 
> a 
> > 
> > > "good" approximate solutions that are not necessarily optimal for 
> 
> > > high-dimensional vectors. I can do random sampling, but this can 
> be 
> > very 
> > > inefficient in high-dimensional problems.  I am looking for 
> > efficient 
> > > algorithms because this step has to be performed in each iteration 
> 
> > of an 
> > > "outer" algorithm.
> > >
> > > Are there any clever adaptive algorithms out there?
> > >
> > 
> > I do not know.
> > 
> > But would you settle for a not-so-clever adaptive heuristic?
> > 
> > If so, see below.
> > 
> > 
> > 
> > > Here is an example illustrating the problem:
> > >
> > > require(e1071)
> > >
> > > n <- 8
> > > x <- runif(n) + 1i * runif(n)
> > > y <- runif(n) + 1i * runif(n)
> > >
> > > dist <- function(x, y) sqrt(sum(Mod(x - y)^2))
> > >
> > > perms <- permutations(n)
> > > dim(perms)  # [1] 40320     8
> > > tmp <- apply(perms, 1, function(ord) dist(x, y[ord]))
> > > z <- y[perms[which.min(tmp), ]]  # exact solution
> > > dist(x, z)
> > >
> > > # an aproximate random-sampling approach
> > > nsamp <- 10000
> > > perms <- t(replicate(nsamp, sample(1:n, size=n, replace=FALSE)))
> > > tmp <- apply(perms, 1, function(ord) dist(x, y[ord]))
> > > z.app <- y[perms[which.min(tmp), ]]  # approximate solution
> > > dist(x, z.app)
> > >
> > 
> > The heuristic is to use a stochastic greedy updates. Here is a very 
> 
> > simple 
> > one:
> > 
> > swap.samp <-
> >   function(index) {
> >          sub.ind <- sample(seq(along=index),2)
> >          index[sub.ind]<- rev(sub.ind)
> >          index
> >   }
> > 
> > 
> > z.app <- y
> > z.cand <- y
> > 
> > for (i in 1:100) z.cand <-
> >  	if( dist(x,z.app) < dist(x,z.cand) ) {
> > 
> >          	z.app[swap.samp(1:8)]
> > 
> >  	} else {
> >  	z.app <- z.cand
> >  	z.cand[swap.samp(1:8)]
> >  	}
> > 
> > On your toy example, this usually finds the min(dist(x,z.app))  in < 
> 
> > 100 
> > trials.
> > 
> > Note that when
> > 
> >  	z.diff <- z.app != z.cand
> > 
> >  	dist(x[ z.diff ],z.app[ z.diff ])^2 - dist(x[ z.diff ],z.cand[ 
> > z.diff ])^2
> > 
> > equals
> > 
> >  	dist(x,z.app)^2 - dist(x,z.cand)^2
> > 
> > so you could vectorize the above to randomly pair up all the points 
> 
> > (for n 
> > even) then update the n%/%2 pairs all at once.
> > 
> > 
> > For large problems, you might try to preferentially sample pairs of 
> 
> > points 
> > with similar values of Mod ( x - z.app ) to begin with. The 
> heuristic 
> > 
> > being that in two pairs of points that are both distant, swapping 
> them 
> > 
> > might realize a bigger benefit.
> > 
> > 
> > --
> > 
> > Another version is to sample k points, randomly permute, then do a 
> > greedy 
> > update:
> > 
> > sub.samp <-
> >   function(index,n) {
> >  	sub.ind <- sample(seq(along=index),n)
> >  	index[sub.ind]<- sample(sub.ind)
> >  	index
> > }
> > 
> > 
> > # k is 4 here
> > 
> > for (i in 1:100) z.cand <-
> >  	if( dist(x,z.app) < dist(x,z.cand) ){
> >          	z.app[sub.samp(1:8,4)]
> >  	} else {
> >          	z.app <- z.cand
> >          	z.cand[sub.samp(1:8,4)]
> >  	}
> > 
> > 
> > On your toy problem, this takes in the low 100's to find the min.
> > 
> > I think you can do the similar vectorization trick here.
> > --
> > 
> > I suppose another variant would repeatedly sample k points, then 
> > enumerate 
> > distances for all permutations of them, then choose the best one to 
> 
> > update z.app.
> > 
> > Again, in larger problems, one might weight those k points towards 
> > higher 
> > values of Mod( x - z.app ) at least to begin with.
> > 
> > HTH,
> > 
> > Chuck
> > 
> > 
> > > Thanks for any help.
> > >
> > > Best,
> > > Ravi.
> > >
> > >
> > > ____________________________________________________________________
> > >
> > > Ravi Varadhan, Ph.D.
> > > Assistant Professor,
> > > Division of Geriatric Medicine and Gerontology
> > > School of Medicine
> > > Johns Hopkins University
> > >
> > > Ph. (410) 502-2619
> > > email: rvaradhan at jhmi.edu
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > 
> > > PLEASE do read the posting guide 
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > 
> > Charles C. Berry                            (858) 534-2098
> >                                              Dept of 
> Family/Preventive 
> > Medicine
> > E                     UC San Diego
> >   La Jolla, San Diego 92093-0901
> > 
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> 
> PLEASE do read the posting guide 
> and provide commented, minimal, self-contained, reproducible code.