[R] choosing best 'match' for given factor

Bert Gunter gunter.berton at gene.com
Thu Mar 31 18:21:04 CEST 2011


Folks:

I think the following may be somewhat faster, as it avoids sorting:

bmat <- function(mx,vec)
{
  nm <- colnames(mx)
  ivec <- match(vec,nm)
  sapply(ivec,function(k){
   if(k==1)NA  else {
    lookat <- setdiff(seq_len(k-1),ivec) ## only those to left and not
in search vector ##
    nm[lookat[which.max(mx[lookat,k] )]]
   }
  }
 )
}

-- Bert

On Thu, Mar 31, 2011 at 8:30 AM, Nick Sabbe <nick.sabbe at ugent.be> wrote:
>
> Hi Murali.
> I haven't compared, but this is what I would do:
>
> bestMatch<-function(searchVector, matchMat)
> {
>        searchRow<-unique(sort(match(searchVector, colnames(matchMat)))) #if
> you're sure, you could drop unique
>        cat("Original row indices:")
>        print(searchRow)
>        matchMat<-matchMat[, -searchRow, drop=FALSE] #avoid duplicates
> altogether
>        cat("Corrected Matrix:\n")
>        print(matchMat)
>        correctedRows<-searchRow - seq_along(searchRow) + 1 #works because
> of the sort above
>        cat("Corrected row indices:")
>        print(correctedRows)
>        sapply(correctedRows, function(cr){
>                        lookWhere<-matchMat[cr, seq(cr-1)]
>                        cat("Will now look into:\n")
>                        print(lookWhere)
>                        cc<-which.max(lookWhere)
>                        cat("Max at position", cc, "\n")
>                        colnames(matchMat)[cc]
>                })
> }
> I don't think there's that much difference. Depending on specific sizes, it
> may be more or less costly to first shrink the search matrix like I do. And
> similarly depending, I may be better still if you remove the rows that
> you're not interested in as well (some more but similar index trickery
> required then.
>
> HTH,
>
>
> Nick Sabbe
> --
> ping: nick.sabbe at ugent.be
> link: http://biomath.ugent.be
> wink: A1.056, Coupure Links 653, 9000 Gent
> ring: 09/264.59.36
>
> -- Do Not Disapprove
>
>
>
>
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf Of Murali.Menon at avivainvestors.com
> Sent: donderdag 31 maart 2011 16:46
> To: r-help at r-project.org
> Subject: [R] choosing best 'match' for given factor
>
> Folks,
>
> I have a 'matching' matrix between variables A, X, L, O:
>
> > a <- structure(c(1, 0.41, 0.58, 0.75, 0.41, 1, 0.6, 0.86, 0.58,
> 0.6, 1, 0.83, 0.75, 0.86, 0.83, 1), .Dim = c(4L, 4L), .Dimnames = list(
>    c("A", "X", "L", "O"), c("A", "X", "L", "O")))
>
> > a
>      A     X     L     O
> A  1.00  0.41  0.58  0.75
> X  0.41  1.00  0.60  0.86
> L  0.58  0.75  1.00  0.83
> O  0.60  0.86  0.83  1.00
>
> And I have a search vector of variables
>
> > v <- c("X", "O")
>
> I want to write a function bestMatch(searchvector, matchMat) such that for
> each variable in searchvector, I get the variable that it has the highest
> match to - but searching only among variables to the left of it in the
> 'matching' matrix, and not matching with any variable in searchvector
> itself.
>
> So in the above example, although "X" has the highest match (0.86) with "O",
> I can't choose "O" as it's to the right of X (and also because "O" is in the
> searchvector v already); I'll have to choose "A".
>
> For "O", I will choose "L", the variable it's best matched with - as it
> can't match "X" already in the search vector.
>
> My function bestMatch(v, a) will then return c("A", "L")
>
> My matrix a is quite large, and I have a long list of search vectors v, so I
> need an efficient method.
>
> I wrote this:
>
> bestMatch <- function(searchvector,  matchMat) {
>        sapply(searchvector, function(cc) {
>                             y <- matchMat[!(rownames(matchMat) %in%
> searchvector) & (index(rownames(matchMat)) < match(cc, rownames(matchMat))),
> cc, drop = FALSE];
>                             rownames(y)[which.max(y)]
>        })
> }
>
> Any advice?
>
> Thanks,
>
> Murali
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
"Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions."

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics



More information about the R-help mailing list