[R] Developing functions
Liaw, Andy
andy_liaw at merck.com
Thu Jul 1 03:46:16 CEST 2004
> From: daniel at sintesys.com.ar
>
> Hi,
> I´m new in R. I´m working with similarity coefficients for clustering
> items. I created one function (coef), to calculate the
> coefficients from
> two pairs of vectors and then, as an example, the function
> simple_matching,
> taking a data.frame(X) and using coef in a for cicle.
> It works, but I believe it is a bad way to do so (I believe
> the for cicle
> is not necessary). Somebody can suggest anything better.
> Thanks
> Daniel Rozengardt
>
> coef<-function(x1,x2){a<-sum(ifelse(x1==1&x2==1,1,0));
> b<-sum(ifelse(x1==1&x2==0,1,0));
> c<-sum(ifelse(x1==0&x2==1,1,0));
> d<-sum(ifelse(x1==0&x2==0,1,0));
> ret<-cbind(a,b,c,d);
> ret
> }
>
> simple_matching<-function(X) {
> ret<-matrix(ncol=dim(X)[1],nrow=dim(X)[1]);
> diag(ret)<-1;
> for (i in 2:length(X[,1])) {
> for (j in i:length(X[,1])) {
> vec<-coef(X[i-1,],X[j,]);
> result<-(vec[1]+vec[3])/sum(vec);
> ret[i-1,j]<-result;
> ret[j,i-1]<-result}};
> ret}
A few comments first:
1. Unless you are putting multiple statements on the same line, there's no
need to use ";".
2. In `coef' (which is a bad choice for a function name: There's a built-in
generic function by that name in R, for extracting coefficients from fitted
model objects), a, b, c and d are scalars. You don't need to cbind() them;
c() works just fine.
3. One of the best strategies for efficiency is to vectorize. Try to
formulate the problem in matrix/vector operations as much as possible.
4. The computation looks a bit odd to me. Assuming the data are binary
(i.e., all 0s and 1s), you are computing (N11 + N01) / N, where N is the
length of the vectors, N11 is the number of 1-1 matches and N01 is the
number of 0-1 matches. Are you sure that's what you want to compute?
Here's what I'd do (assuming the input matrix contains all 0s and 1s):
simple_matching <- function(X) {
N11 <- crossprod(t(X))
N01 <- crossprod(t(X), t(1-X))
ans <- (N11 + N01) / ncol(X)
diag(ans) <- 1
ans
}
HTH,
Andy
More information about the R-help
mailing list