[R] counting the occurrences of vectors
Gabor Grothendieck
ggrothendieck at myway.com
Tue Jul 6 05:01:59 CEST 2004
Marc Schwartz <MSchwartz <at> MedAnalytics.com> writes:
> row.match.count <- function(m1, m2)
> {
> if (ncol(m1) != (ncol(m2)))
> stop("Matrices must have the same number of columns")
>
> if (typeof(m1) != (typeof(m2)))
> stop("Matrices must have the same data type")
>
> m1.l <- as.character(apply(m1, 1, list))
> m2.l <- as.character(apply(m2 ,1, list))
>
> # return counts for each m1.l in m2.l
> match.table <- table(c(unique(m1.l), m2.l))[m1.l] - 1
>
> # clean up table names
> if (typeof(m1) == "integer")
> {
> names(match.table) <- sub("^list\\(as.integer\\(", "",
> names(match.table))
> names(match.table) <- sub("\\)\\)$", "", names(match.table))
> }
> else if (typeof(m1) == "character")
> {
> names(match.table) <- sub("^list\\(", "", names(match.table))
> names(match.table) <- sub("\\)$", "", names(match.table))
> }
>
> match.table
> }
One could still make use of your as.character(apply(m1,1,list)) idea
without the type-specific processing by using the original paste idea
on the answer name vector rather than on m1 and m2. Also, adding
sep = ":" to the arg list to let the user override it in the
event that : appears in the data and making some other cosmetic changes,
we have:
row.match.count.2 <- function(m1, m2, sep = ":") {
stopifnot(ncol(m1) == ncol(m2), typeof(m1) == typeof(m2))
m1 <- as.character(apply(m1, 1, list))
m2 <- as.character(apply(m2 ,1, list))
ans <- c(table(c(unique(m1), m2))[m1] - 1)
f <- function(x)paste(eval(parse(text=x))[[1]], collapse=sep)
names(ans) <- sapply(names(ans),f)
ans
}
This does not run as fast as row.match.count but its faster than
f1 and it avoids the potentially problematic type-specific regex
name mangling portion of row.match.count.
More information about the R-help
mailing list