[R] choosing best 'match' for given factor

Murali.Menon at avivainvestors.com Murali.Menon at avivainvestors.com
Thu Mar 31 16:46:05 CEST 2011


Folks,

I have a 'matching' matrix between variables A, X, L, O:

> a <- structure(c(1, 0.41, 0.58, 0.75, 0.41, 1, 0.6, 0.86, 0.58, 
0.6, 1, 0.83, 0.75, 0.86, 0.83, 1), .Dim = c(4L, 4L), .Dimnames = list(
    c("A", "X", "L", "O"), c("A", "X", "L", "O")))

> a
      A     X     L     O
A  1.00  0.41  0.58  0.75
X  0.41  1.00  0.60  0.86
L  0.58  0.75  1.00  0.83
O  0.60  0.86  0.83  1.00

And I have a search vector of variables

> v <- c("X", "O")

I want to write a function bestMatch(searchvector, matchMat) such that for each variable in searchvector, I get the variable that it has the highest match to - but searching only among variables to the left of it in the 'matching' matrix, and not matching with any variable in searchvector itself.

So in the above example, although "X" has the highest match (0.86) with "O", I can't choose "O" as it's to the right of X (and also because "O" is in the searchvector v already); I'll have to choose "A".

For "O", I will choose "L", the variable it's best matched with - as it can't match "X" already in the search vector.

My function bestMatch(v, a) will then return c("A", "L")

My matrix a is quite large, and I have a long list of search vectors v, so I need an efficient method.

I wrote this:

bestMatch <- function(searchvector,  matchMat) {
        sapply(searchvector, function(cc) {
                             y <- matchMat[!(rownames(matchMat) %in% searchvector) & (index(rownames(matchMat)) < match(cc, rownames(matchMat))), cc, drop = FALSE];
                             rownames(y)[which.max(y)]
        })   
}

Any advice?

Thanks,

Murali



More information about the R-help mailing list