[R] Constrained vector permutation
Charles C. Berry
cberry at tajo.ucsd.edu
Sat Jan 30 04:23:07 CET 2010
On Fri, 29 Jan 2010, Andrew Rominger wrote:
> Being reasonably sure that all valid permutations are equally probable is
> important to me. I've played around with search algorithms in permuting
> contingency tables and find that possible solutions decrease rapidly once
> one starts assigning values, particularly if small values are assigned
> first, so it would seem all solutions are not equally probable (not only
> that but one frequently encounters "dead ends" where there are values left
> to assign and no allowable place to put them). As such I think I'd opt to
> use sample()... several times if needed.
> To clarify, yes, I only need one valid permutation, the idea is I'll
> generate 1000s of ordered vectors, and then for each one generate one valid
> Thanks very much for the help and insights--
If you have some sense of importance sampling and/or MCMC you might look
Zaman and Simberloff (2002, Environmental and Ecological Statistics 9,
which concerns sampling a binary matrix with fixed margins - not quite
your problem, but akin to it in being a combinatorial nightmare without
an obvious direct solution of workable size for real problems.
They define a neighborhood for each allowable matrix s.t. swapping a pair
of 1's at ij and kl with a pair of 0's at il and kj (which doesn't
violate the margin constraints) leads to a member of the neighborhood.
IIRC, the size of the neighborhood and the sizes of the neighborhoods of
the members of its neighborhood determine the probabilities of staying put
or moving to get the next element of the MCMC chain and which member of
the neighborhood to choose.
I suppose something like that (i.e. defining neighborhoods of allowable
permutations, measuring their size, and using this to guide sampling or
develop importance weights) might apply in your case. Maybe something like
this: start with an ordering of your n-vector that conforms to the
constraints, look at all the choose(n,2) pairs of elements and check which
of them could be exchanged to yield another conforming ordering; the
allowable swaps define the neighborhood, and their number is its size.
So, the idea is to develop an MCMC sampler. Run it for each ordered vector
to get past the burn in, then take one value from it.
> On Thu, Jan 28, 2010 at 3:04 PM, Thomas Lumley <tlumley at u.washington.edu>wrote:
>> On Thu, 28 Jan 2010, Jason Smith wrote:
>> It wouldn't be guaranteed to produce any usable permutation, but it seems
>>>> like it would be much faster and so could be repeated until an acceptable
>>>> vector is found. What do you think?
>>> I think I am not understanding what your ultimate goal is so I'm not
>>> sure I can give you appropriate advice. Are you looking for a single
>>> valid permutation or all of them?
>>> Since that constraint sets a ceiling on each subsequent value, it
>>> seems like you could solve this problem more easily and quickly by
>>> using a search strategy instead of random sampling or generating all
>>> permutations then testing. The constraint will help prune the search
>>> space so you only generate valid permutations. Once you are examining
>>> a particular element you can determine which of the additional
>>> elements would be valid, so only consider those.
>> It's easy to generate valid permutations this way. It does not appear
>> straightforward to ensure that all valid permutations are sampled with equal
>> probability, which I thought was part of the specification of the problem.
>> Thomas Lumley Assoc. Professor, Biostatistics
>> tlumley at u.washington.edu University of Washington, Seattle
> [[alternative HTML version deleted]]
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
More information about the R-help