Ryan Austin austin at botany.utoronto.ca
Fri Oct 13 23:29:31 CEST 2006

```Thanks for the thought in any case Mark.  Your right about the brute force.
I'll expand a bit with an example though for the sake of clarity.

Given a correlation matrix of 4 covariates ABCD with distances of:
AB=0.2;  AC=0.6; AD=0.3 ; BC=0.9 ; BD=0.8 ; CD=0.7

Find the optimal subset (size > n, n being the number of covariates)
where the mean of r for the subset is a maximum.
Of course all NxN distances need to be considered between any chosen
subset covariates.

Thus for n>1, the solution would be simply BC = 0.9
And for n>2, the solution would be BCD as (BC + CD + BD)/3) = 0.8 is the
maximum mean r value that could be obtained from
any of the subsets with n>2.

I'd expected that this would be a common problem but 2 days of googling
has given me little.  I'm expecting a greedy graph traversal
or the like will be my answer but I'd hoped to whip a solution of in R.
Any help would be greatly appreciated.
Ryan

Leeds, Mark (IED) wrote:

>hi ryan : I reread and you already have the correlation matrix so brute
>force should definitely work.
>So, if the correlation matrix was size 20 by 20 and your n was 9.
>
>Then, you have to have of size 10 or greater so  the number of
>possoibilities would be ( 20 choose 10 ) + ( 20 choose 11 ) +  ( 200
>choose 12 ) +  ( 20 choose 13 ) + .........  ( 20 choose 20 )
>
>Oh boy, it is too large a problem to do by brute force. There are too
>many possibilities even for this size of problem.
>Hopefully Someone else will have a better idea. Forget my brute force
>idea. It's useless and I apologize. I Made a mistake.
Hello R group,
>
>Given a correlation matrix, I would like to obtain the best subset of
>pairs in the matrix of some size > n such that the mean of r for that
>subset is a maximum compared to any other possible subset of size > n.
>I've been looking at the deal and subselect packages but they don't seem
>to do what I need.  Does anyone have any suggestions?
>Ryan
