[R] Developing functions

Liaw, Andy andy_liaw at merck.com
Thu Jul 1 03:46:16 CEST 2004


> From: daniel at sintesys.com.ar
> 
> Hi,
> I´m new in R. I´m working with similarity coefficients for clustering
> items. I created one function (coef), to calculate the 
> coefficients from
> two pairs of vectors and then, as an example, the function
> simple_matching,
> taking a data.frame(X) and using coef in a for cicle.
> It works, but I believe it is a bad way to do so (I believe 
> the for cicle
> is not necessary). Somebody can suggest anything better.
> Thanks
> Daniel Rozengardt
> 
> coef<-function(x1,x2){a<-sum(ifelse(x1==1&x2==1,1,0));
> b<-sum(ifelse(x1==1&x2==0,1,0));
> c<-sum(ifelse(x1==0&x2==1,1,0));
> d<-sum(ifelse(x1==0&x2==0,1,0));
> ret<-cbind(a,b,c,d);
> ret
> }
> 
> simple_matching<-function(X) {
> ret<-matrix(ncol=dim(X)[1],nrow=dim(X)[1]);
> diag(ret)<-1;
> for (i in 2:length(X[,1])) {
> 	for (j in i:length(X[,1])) {
> 	vec<-coef(X[i-1,],X[j,]);
> 	result<-(vec[1]+vec[3])/sum(vec);
> 	ret[i-1,j]<-result;
> 	ret[j,i-1]<-result}};
> ret}

A few comments first:

1. Unless you are putting multiple statements on the same line, there's no
need to use ";".

2. In `coef' (which is a bad choice for a function name: There's a built-in
generic function by that name in R, for extracting coefficients from fitted
model objects), a, b, c and d are scalars.  You don't need to cbind() them;
c() works just fine.

3. One of the best strategies for efficiency is to vectorize.  Try to
formulate the problem in matrix/vector operations as much as possible.

4. The computation looks a bit odd to me.  Assuming the data are binary
(i.e., all 0s and 1s), you are computing (N11 + N01) / N, where N is the
length of the vectors, N11 is the number of 1-1 matches and N01 is the
number of 0-1 matches.  Are you sure that's what you want to compute?

Here's what I'd do (assuming the input matrix contains all 0s and 1s):

simple_matching <- function(X) {
    N11 <- crossprod(t(X))
    N01 <- crossprod(t(X), t(1-X))
    ans <- (N11 + N01) / ncol(X)
    diag(ans) <- 1
    ans
}

HTH,
Andy




More information about the R-help mailing list