[R] Permutations

Wed Jul 14 20:49:30 CEST 2004

On Wed, 2004-07-14 at 08:06, Rolf Turner wrote:
> In respect of generating random ``restricted'' permutations, it
> occurred to me as I was driving home last night .... If one is going
> to invoke some kind of ``try again if it doesn't work procedure''
> then one might as well keep it simple:  Essentially use the rejection
> method.  Just generate a random permutation, and then check whether
> it meets the restriction criterion.   If yes, return that
> permutation, if not, throw it away and try again.
> 
> This will definitely (???) genererate a genuinely random restricted
> permutation.  I figured that since a very large fraction of permutations
> are acutally restricted permutions one wouldn't reject much of the
> time, and so the rejection method should be pretty efficient.
> 
> I wrote the following code to do the work:
> 
> restr.perm2 <- function () {
> #
> okay <- function (x) {
> 	m1 <- cbind(1:12,rep(1:4,rep(3,4)))
> 	m2 <- m1[x,]
> 	all((m2[,1] == m1[,1]) | (m2[,2] != m1[,2]))
> }
> #
> repeat{
> 	x <- sample(12,12)
> 	if(okay(x)) return(x)
> }
> }
> 
> I'm bothered again, but:  I did a little test to see how many tries
> it would take on average.  On 1000 attempts I got a mean of 8.44
> tries, with a standard deviation of 7.7610 (standard error of the
> mean = 0.2454).  The value of 7.76 is roughly consistent with
> sqrt(1-p.hat)/p.hat = 7.92 that one gets from the geometric
> distribution.
> 
> This would indicate that the fraction of ``restricted'' permutations
> is something like p.hat = 1/8.44 = 0.1184834.  Which is a lot less
> than the rough estimate of (4.7 x 10^6)/12! approx. = 0.9853 from
> Robert Baskin's back-of-the-envelope calculations.
> 
> What's going on/wrong?

Rolf,

I have not quite figured out what the issues are, but I took your
approach above and combined it with Gabor's f1a() function that was the
result of the recent thread on matching/counting rows between matrices
and came up with the following. The basic concept is that we know (or
think that we know) what the non-allowable set of intra-block
permutations are:

  # Create non-allowable 'intra-block' permutations
  # Need a generalizable way to do this, but
  # good enough for now
  a <- permutations(3, 3, 1:3)
  b <- permutations(3, 3, 4:6)
  c <- permutations(3, 3, 7:9)
  d <- permutations(3, 3, 10:12)
  intra <- rbind(a[-1, ], b[-1, ], c[-1, ], d[-1, ])

'intra' looks like:

> intra
      [,1] [,2] [,3]
 [1,]    1    3    2
 [2,]    2    1    3
 [3,]    2    3    1
 [4,]    3    1    2
 [5,]    3    2    1
 [6,]    4    6    5
 [7,]    5    4    6
 [8,]    5    6    4
 [9,]    6    4    5
[10,]    6    5    4
[11,]    7    9    8
[12,]    8    7    9
[13,]    8    9    7
[14,]    9    7    8
[15,]    9    8    7
[16,]   10   12   11
[17,]   11   10   12
[18,]   11   12   10
[19,]   12   10   11
[20,]   12   11   10

With that in place, we can then take the randomly generated 'x', coerce
it into a 3 column matrix by row and use Gabor's function to check for
any matches of the rows of 3 in 'x' with the non-allowable permutations
in 'intra'. 

Thus, the function will assign 'x' to the appropriate row in 'results'
(which is pre-allocated based upon the number of runs) if 'x' passes or
sets the row to NA's if it does not. 

I then use complete.cases() to return only the valid rows. Presumably,
some of the randomly generated 'x' vectors could be duplicates, so I
also use unique():

restr.perm3 <- function(runs)
{
  results <- matrix(numeric(runs * 12), ncol = 12)

  # use Gabor's function to check for row matches
  # between 'x' and 'intra' to filter out in okay()
  f1a <- function(a,b,sep=":")
  {
    f <- function(...) paste(..., sep=sep)
    a2 <- do.call("f", as.data.frame(a))
    b2 <- do.call("f", as.data.frame(b))
    c(table(c(b2,unique(a2)))[a2] - 1)
  }

  okay <- function (x)
  {
    x.match <- matrix(x, ncol = 3, byrow = TRUE)
    all(f1a(x.match, intra) == 0)
  }

  for (i in 1:runs)
  {
    x <- sample(12,12)
    if (okay(x))
      results[i, ] <- x
    else
      results[i, ] <- rep(NA, 12)
  }

  unique(results[complete.cases(results), ])
}

So, with all that in place, we can then do something like the following:

r <- replicate(10, restr.perm3(1000))

> str(r)
List of 10
 $ : num [1:934, 1:12] 10 6 8 7 7 11 1 1 8 4 ...
 $ : num [1:944, 1:12] 7 4 4 11 1 11 8 3 3 9 ...
 $ : num [1:953, 1:12] 8 4 11 1 11 6 7 11 9 10 ...
 $ : num [1:951, 1:12] 1 2 1 4 4 10 8 10 3 12 ...
 $ : num [1:949, 1:12] 1 7 11 9 2 2 11 7 11 4 ...
 $ : num [1:937, 1:12] 3 3 11 10 4 8 12 10 3 2 ...
 $ : num [1:952, 1:12] 7 3 1 6 4 4 4 11 2 8 ...
 $ : num [1:946, 1:12] 1 9 3 10 11 6 1 7 8 11 ...
 $ : num [1:945, 1:12] 8 9 1 2 8 3 4 1 7 11 ...
 $ : num [1:933, 1:12] 9 1 12 3 4 1 10 1 1 8 ...

So the above would suggest that indeed, the restricted permutations are
a fairly high proportion of the total.

I went ahead and did the following 50 replicates of 1000 each:

> system.time(r <- replicate(50, restr.perm3(1000)))
[1] 91.20  0.06 92.98  0.00  0.00

> mean(unlist(lapply(r, nrow)))
[1] 941.52

> sd(unlist(lapply(r, nrow)))
[1] 6.494079

There are likely to be some efficiencies in the function that can be
brought to bear, but it is a start.

In either case, the restricted permutations appear to be around 94%, if
all of the assumptions are correct.

HTH,

Marc Schwartz