[R] a correlation matrix subset where the subset avg is a maximum

Ryan Austin austin at botany.utoronto.ca
Fri Oct 13 23:29:31 CEST 2006

Thanks for the thought in any case Mark.  Your right about the brute force.
I'll expand a bit with an example though for the sake of clarity.

Given a correlation matrix of 4 covariates ABCD with distances of:
AB=0.2;  AC=0.6; AD=0.3 ; BC=0.9 ; BD=0.8 ; CD=0.7

Find the optimal subset (size > n, n being the number of covariates) 
where the mean of r for the subset is a maximum.
Of course all NxN distances need to be considered between any chosen 
subset covariates.

Thus for n>1, the solution would be simply BC = 0.9
And for n>2, the solution would be BCD as (BC + CD + BD)/3) = 0.8 is the 
maximum mean r value that could be obtained from
any of the subsets with n>2.

I'd expected that this would be a common problem but 2 days of googling 
has given me little.  I'm expecting a greedy graph traversal
or the like will be my answer but I'd hoped to whip a solution of in R.
Any help would be greatly appreciated.

Leeds, Mark (IED) wrote:

>hi ryan : I reread and you already have the correlation matrix so brute
>force should definitely work.
>So, if the correlation matrix was size 20 by 20 and your n was 9.
>Then, you have to have of size 10 or greater so  the number of
>possoibilities would be ( 20 choose 10 ) + ( 20 choose 11 ) +  ( 200
>choose 12 ) +  ( 20 choose 13 ) + .........  ( 20 choose 20 )
>Oh boy, it is too large a problem to do by brute force. There are too
>many possibilities even for this size of problem.
>Hopefully Someone else will have a better idea. Forget my brute force
>idea. It's useless and I apologize. I Made a mistake.
>-----Original Message-----
>From: r-help-bounces at stat.math.ethz.ch
>[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Ryan Austin
>Sent: Friday, October 13, 2006 2:43 PM
>To: r-help at stat.math.ethz.ch
>Subject: [R] a correlation matrix subset where the subset avg is a
>Hello R group,
>Given a correlation matrix, I would like to obtain the best subset of
>pairs in the matrix of some size > n such that the mean of r for that
>subset is a maximum compared to any other possible subset of size > n.  
>I've been looking at the deal and subselect packages but they don't seem
>to do what I need.  Does anyone have any suggestions?
>Thanks in advance,
>R-help at stat.math.ethz.ch mailing list
>PLEASE do read the posting guide
>and provide commented, minimal, self-contained, reproducible code.
>This is not an offer (or solicitation of an offer) to buy/sell the securities/instruments mentioned or an official confirmation.  Morgan Stanley may deal as principal in or own or act as market maker for securities/instruments mentioned or may advise the issuers.  This is not research and is not from MS Research but it may refer to a research analyst/research report.  Unless indicated, these views are the author's and may differ from those of Morgan Stanley research or others in the Firm.  We do not represent this is accurate or complete and we may not update this.  Past performance is not indicative of future returns.  For additional information, research reports and important disclosures, contact me or see https://secure.ms.com/servlet/cls.  You should not use e-mail to request, authorize or effect the purchase or sale of any security or instrument, to send transfer instructions, or to effect any other transactions.  We cannot guarantee that any such requests received via e-mail will be processed in a timely manner.  This communication is solely for the addressee(s) and may contain confidential information.  We do not waive confidentiality by mistransmission.  Contact me if you do not wish to receive these communications.  In the UK, this communication is directed in the UK to those persons who are market counterparties or intermediate customers (as defined in the UK Financial Services Authority's rules).

More information about the R-help mailing list