[R] Puzzle

Ben Holt BHolt at bio.ku.dk
Thu Aug 26 10:12:49 CEST 2010


I have data similar to this:

Location Surveyor Result
A        1         83
A        2         76
A        3         45
B        1         71
B        4         67
C        2         23
C        5         12
D        3         34
E        4         75
F        4         46
G        5         90
etc (5 million records in total)

I need to divide the data to many subsets then randomly select 5 different locations and 5 different surveyors (one at each of the 5 randomly selected locations) for each subset.

The function I have written basically picks five locations and then 1 surveyor in each location, checks that there are five different surveyors and if there isn't tries again.  The problem is that for some subsets this doesn't work.

Some subsets don't have enough locations/surveyors or both, but this can be checked for easily.  The problem subsets do have enoughs locations and surveyors but still cannot produce 5 locations each with a different surveyor.  The matrix below demonstrates such a subset:
 
                  locations
                  A B C D E
                1 1 0 0 0 0
Surveyors       2 1 0 0 0 0
                3 1 0 0 0 0
                4 1 0 0 0 0
                5 1 1 1 1 1

I cannot think of a way to check for such a situation and therefore I have simply programmed the function to give up after 100 attempts if it can't find a solution.  This is not very satisfactory however as the analysis takes a very long time to run and it would also be very useful useful for me to know how many suitable solution there are.

I reckon some of you clever folk out there must be able to think of a better solution.

Any advice appreciated,

Ben



More information about the R-help mailing list