[R] counting the occurrences of vectors

Gabor Grothendieck ggrothendieck at myway.com
Tue Jul 6 05:01:59 CEST 2004


Marc Schwartz <MSchwartz <at> MedAnalytics.com> writes:
 
> row.match.count <- function(m1, m2)
> {
>   if (ncol(m1) != (ncol(m2)))
>     stop("Matrices must have the same number of columns")
> 
>   if (typeof(m1) != (typeof(m2)))
>     stop("Matrices must have the same data type")
> 
>   m1.l <- as.character(apply(m1, 1, list))
>   m2.l <- as.character(apply(m2 ,1, list))
> 
>   # return counts for each m1.l in m2.l
>   match.table <- table(c(unique(m1.l), m2.l))[m1.l] - 1
> 
>   # clean up table names
>   if (typeof(m1) == "integer")
>   {
>     names(match.table) <- sub("^list\\(as.integer\\(", "", 
>                               names(match.table))
>     names(match.table) <- sub("\\)\\)$", "", names(match.table))
>   }
>   else if (typeof(m1) == "character")
>   {
>     names(match.table) <- sub("^list\\(", "", names(match.table))
>     names(match.table) <- sub("\\)$", "", names(match.table))
>   }
> 
>   match.table
> }

One could still make use of your as.character(apply(m1,1,list)) idea
without the type-specific processing by using the original paste idea
on the answer name vector rather than on m1 and m2. Also, adding 
sep = ":" to the arg list to let the user override it in the
event that : appears in the data and making some other cosmetic changes,
we have:

row.match.count.2 <- function(m1, m2, sep = ":") {

	stopifnot(ncol(m1) == ncol(m2), typeof(m1) == typeof(m2))

	m1 <- as.character(apply(m1, 1, list))
	m2 <- as.character(apply(m2 ,1, list))

	ans <- c(table(c(unique(m1), m2))[m1] - 1)

	f <- function(x)paste(eval(parse(text=x))[[1]], collapse=sep)
	names(ans) <- sapply(names(ans),f)
	
	ans
}

This does not run as fast as row.match.count but its faster than
f1 and it avoids the potentially problematic type-specific regex
name mangling portion of row.match.count.




More information about the R-help mailing list