[R] Permutations
Marc Schwartz
MSchwartz at MedAnalytics.com
Wed Jul 14 20:49:30 CEST 2004
On Wed, 2004-07-14 at 08:06, Rolf Turner wrote:
> In respect of generating random ``restricted'' permutations, it
> occurred to me as I was driving home last night .... If one is going
> to invoke some kind of ``try again if it doesn't work procedure''
> then one might as well keep it simple: Essentially use the rejection
> method. Just generate a random permutation, and then check whether
> it meets the restriction criterion. If yes, return that
> permutation, if not, throw it away and try again.
>
> This will definitely (???) genererate a genuinely random restricted
> permutation. I figured that since a very large fraction of permutations
> are acutally restricted permutions one wouldn't reject much of the
> time, and so the rejection method should be pretty efficient.
>
> I wrote the following code to do the work:
>
> restr.perm2 <- function () {
> #
> okay <- function (x) {
> m1 <- cbind(1:12,rep(1:4,rep(3,4)))
> m2 <- m1[x,]
> all((m2[,1] == m1[,1]) | (m2[,2] != m1[,2]))
> }
> #
> repeat{
> x <- sample(12,12)
> if(okay(x)) return(x)
> }
> }
>
> I'm bothered again, but: I did a little test to see how many tries
> it would take on average. On 1000 attempts I got a mean of 8.44
> tries, with a standard deviation of 7.7610 (standard error of the
> mean = 0.2454). The value of 7.76 is roughly consistent with
> sqrt(1-p.hat)/p.hat = 7.92 that one gets from the geometric
> distribution.
>
> This would indicate that the fraction of ``restricted'' permutations
> is something like p.hat = 1/8.44 = 0.1184834. Which is a lot less
> than the rough estimate of (4.7 x 10^6)/12! approx. = 0.9853 from
> Robert Baskin's back-of-the-envelope calculations.
>
> What's going on/wrong?
Rolf,
I have not quite figured out what the issues are, but I took your
approach above and combined it with Gabor's f1a() function that was the
result of the recent thread on matching/counting rows between matrices
and came up with the following. The basic concept is that we know (or
think that we know) what the non-allowable set of intra-block
permutations are:
# Create non-allowable 'intra-block' permutations
# Need a generalizable way to do this, but
# good enough for now
a <- permutations(3, 3, 1:3)
b <- permutations(3, 3, 4:6)
c <- permutations(3, 3, 7:9)
d <- permutations(3, 3, 10:12)
intra <- rbind(a[-1, ], b[-1, ], c[-1, ], d[-1, ])
'intra' looks like:
> intra
[,1] [,2] [,3]
[1,] 1 3 2
[2,] 2 1 3
[3,] 2 3 1
[4,] 3 1 2
[5,] 3 2 1
[6,] 4 6 5
[7,] 5 4 6
[8,] 5 6 4
[9,] 6 4 5
[10,] 6 5 4
[11,] 7 9 8
[12,] 8 7 9
[13,] 8 9 7
[14,] 9 7 8
[15,] 9 8 7
[16,] 10 12 11
[17,] 11 10 12
[18,] 11 12 10
[19,] 12 10 11
[20,] 12 11 10
With that in place, we can then take the randomly generated 'x', coerce
it into a 3 column matrix by row and use Gabor's function to check for
any matches of the rows of 3 in 'x' with the non-allowable permutations
in 'intra'.
Thus, the function will assign 'x' to the appropriate row in 'results'
(which is pre-allocated based upon the number of runs) if 'x' passes or
sets the row to NA's if it does not.
I then use complete.cases() to return only the valid rows. Presumably,
some of the randomly generated 'x' vectors could be duplicates, so I
also use unique():
restr.perm3 <- function(runs)
{
results <- matrix(numeric(runs * 12), ncol = 12)
# use Gabor's function to check for row matches
# between 'x' and 'intra' to filter out in okay()
f1a <- function(a,b,sep=":")
{
f <- function(...) paste(..., sep=sep)
a2 <- do.call("f", as.data.frame(a))
b2 <- do.call("f", as.data.frame(b))
c(table(c(b2,unique(a2)))[a2] - 1)
}
okay <- function (x)
{
x.match <- matrix(x, ncol = 3, byrow = TRUE)
all(f1a(x.match, intra) == 0)
}
for (i in 1:runs)
{
x <- sample(12,12)
if (okay(x))
results[i, ] <- x
else
results[i, ] <- rep(NA, 12)
}
unique(results[complete.cases(results), ])
}
So, with all that in place, we can then do something like the following:
r <- replicate(10, restr.perm3(1000))
> str(r)
List of 10
$ : num [1:934, 1:12] 10 6 8 7 7 11 1 1 8 4 ...
$ : num [1:944, 1:12] 7 4 4 11 1 11 8 3 3 9 ...
$ : num [1:953, 1:12] 8 4 11 1 11 6 7 11 9 10 ...
$ : num [1:951, 1:12] 1 2 1 4 4 10 8 10 3 12 ...
$ : num [1:949, 1:12] 1 7 11 9 2 2 11 7 11 4 ...
$ : num [1:937, 1:12] 3 3 11 10 4 8 12 10 3 2 ...
$ : num [1:952, 1:12] 7 3 1 6 4 4 4 11 2 8 ...
$ : num [1:946, 1:12] 1 9 3 10 11 6 1 7 8 11 ...
$ : num [1:945, 1:12] 8 9 1 2 8 3 4 1 7 11 ...
$ : num [1:933, 1:12] 9 1 12 3 4 1 10 1 1 8 ...
So the above would suggest that indeed, the restricted permutations are
a fairly high proportion of the total.
I went ahead and did the following 50 replicates of 1000 each:
> system.time(r <- replicate(50, restr.perm3(1000)))
[1] 91.20 0.06 92.98 0.00 0.00
> mean(unlist(lapply(r, nrow)))
[1] 941.52
> sd(unlist(lapply(r, nrow)))
[1] 6.494079
There are likely to be some efficiencies in the function that can be
brought to bear, but it is a start.
In either case, the restricted permutations appear to be around 94%, if
all of the assumptions are correct.
HTH,
Marc Schwartz
More information about the R-help
mailing list