[R] help with apply, please

Sat Nov 19 19:51:03 CET 2005

On 19-Nov-05 Adrian Dusa wrote:
> On Saturday 19 November 2005 17:24, Gabor Grothendieck wrote:
>> [...snip...]
>> Although the above is not wrong I should have removed the
>> rbind which is no longer needed and simplifying it further,
>> as it seems that lp will do the rep for you itself for
>> certain arguments, gives:
>>
>> lp("min", rep(1,3), t(mtrx), ">=", 1)$solution  # 1 0 1
> 
> Thank you Gabor, this solution is superbe (you never stop
> amazing me :). Now... it only finds _one_ of the multiple
> minimum solutions. In the toy example, there are two minimum
> solutions, hence I reckon the output should have been a list with:
> [[1]]
> [1] 1 0 1
> 
> [[2]]
> [1] 0 1 1
> 
> Also, thanks to Duncan and yes, I do very much care finding
> the smallest possible solutions (if I correctly understand
> your question).
> 
> It seems that lp function is very promising, but can I use it
> to find _all_ minimum solutions?

Thinking about it, I'm not sure that finding the complete set
of solutions (in general, not just to Adrian's toy example)
can be done without enumeration, complete up to the stage
where it is known that all minimal solutions have been found
(and, having said that, I shall probably provoke an expert
into refuting me; im which case, all the better!).

For example, consider a 9-column matrix with 84 rows, each
with 3 1s and 6 0s (nCm(9,3)=84).

Every minimal solution is the union of 3 rows, e.g.

  1 1 1 0 0 0 0 0 0
  0 0 0 1 1 1 0 0 0
  0 0 0 0 0 0 1 1 1

and there is a 1:1 correspondence between the choices of
3 out of 9 and the 84 rows.

So there are 84 = nCm(9,3) choices of 3 1s for the first row,
leaving 6 0s. There are then nCm(6,3) = 20 choices of 3 1s
for the second row. Having done that, the choice of the 1s
for the third row s determined. But this must be divided by
3! = 6 to give "unordered" rows, so there are 84*20/6 = 280
different minimal solutions.

Without yet having taken this down to the level of R code,
I would envisage an algorithm for an N*k matrix M on the
following lines.

1. Check that it's possible: sum(colSums(matr)==0) = 0

2. For m = 1:N work through all choices of m rows out of
   the N; if, for the current value of m, any one of these
   covers, complete the choices for this value of m,
   noting which choices cover, and then exit.

This will give you all the choices of m out of N rows
which cover the columns, for the smallest value of m
for which this is possible, and you will not have
searched in larger values of m.

The function 'combn' in package combinat may be useful
here: combn(N,m) returns an array with nCm columns and
m rows, where the values in a column are a choice of m
out of (1:N).

There is an opposite version of this algorithm: If there
are not many 1s in matr, then you might guess that a lot
of rows may be needed. So it may be more efficient to
do it the other way round: Starting with all rows (m=N),
leave them out 1 at a time, then 2 at a time, and so on,
until you get a value of m such that no choice of m out
of N covers the columns. Then (m+1) is the minimal order
of a covering set of rows, and you've just found all these.

But this may not be sound either, since if one row has
many 1s then possibly some few others can make up the
covering, so it would be better to do it the original
way round. I can't get a clear view of what sort of
criterion to apply to determine this issue programmatically.

There is bound to be a good algorithm out there somewhere
for finding a "minimal coveriung set" but I don't know it!

Comments?

Best wishes to all,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 19-Nov-05                                       Time: 18:51:00
------------------------------ XFMail ------------------------------