[R] Very Slow Gower Similarity Function
Tyler Smith
tyler.smith at mail.mcgill.ca
Mon Apr 18 18:10:34 CEST 2005
Hello,
I am a relatively new user of R. I have written a basic function to calculate
the Gower similarity function. I was motivated to do so partly as an excercise
in learning R, and partly because the existing option (vegdist in the vegan
package) does not accept missing values.
I think I have succeeded - my function gives me the correct values. However, now
that I'm starting to use it with real data, I realise it's very slow. It takes
more than 45 minutes on my Windows 98 machine (R 2.0.1 Patched (2005-03-29))
with a 185x32 matrix with ca 100 missing values. If anyone can suggest ways to
speed up my function I would appreciate it. I suspect having a pair of nested
for loops is the problem, but I couldn't figure out how to get rid of them.
The function is:
### Gower Similarity Matrix###
sGow <- function (mat){
OBJ <- nrow(mat) #number of objects
MATDESC <- ncol (mat) #number of descriptors
MRANGE <- apply (mat,2,max, na.rm=T)-apply (mat,2,min,na.rm=T) #descr ranges
DESCRIPT <- 1:MATDESC #descriptor index vector
smat <- matrix(1, nrow = OBJ, ncol = OBJ) #'empty' similarity matrix
for (i in 1:OBJ){
for (j in i:OBJ){
##calculate index vector of non-NA descriptors between objects i and j
descvect <- intersect (setdiff (DESCRIPT, DESCRIPT[is.na(mat[i,DESCRIPT])]),
setdiff (DESCRIPT, DESCRIPT[is.na (mat[j,DESCRIPT])]))
descnum <- length(descvect) # number of valid descr for i~j comparison
partialsim <- (1- abs(mat[i,descvect]-mat[j,descvect])/MRANGE[descvect])
smat[i,j] <- smat[j,i] <- sum (partialsim) / descnum
}
}
smat
}
Thank-you for your time,
Tyler
--
Tyler Smith
PhD Candidate
Plant Science Department
McGill University
tyler.smith at mail.mcgill.ca
More information about the R-help
mailing list