[R] Aggregate weights for a unique set of rows
Rui Barradas
ruipbarradas at sapo.pt
Fri Jun 29 02:04:00 CEST 2012
Hello,
Would around two orders of magnitude interess?
f1 <- function(Nodes, Weights){
drop.index <- duplicated(Nodes)
n.unique <- Nodes[!drop.index, ]
w.unique <- numeric(length(n.unique[,1]))
lw <- length(Weights)
for (i in seq_along(w.unique)){
index <- as.logical(2 == rowSums(
Nodes == matrix(rep(n.unique[i,],lw), byrow=TRUE, nrow=lw)))
w.unique[i] <- sum(Weights[index])
}
list(n.unique=n.unique, w.unique=w.unique)
}
f2 <- function(Nodes, Weights){
rows <- paste(Nodes[,1], Nodes[,2], sep=".")
w.uniq <- tapply(Weights, rows, sum)
attributes(w.uniq) <- NULL
ord <- order(unique(rows))
list(n.unique=unique(Nodes), w.unique=w.uniq[order(ord)])
}
# Test it
M <- 100 # see text below
n <- 2e5
set.seed(1234)
nd <- matrix(sample(M, n*2, TRUE), n, 2)
ww <- rep(1, n)
t1 <- system.time(r1 <- f1(nd, ww))
t2 <- system.time(r2 <- f2(nd, ww))
identical(r1, r2)
print(rbind(t1=t1, t2=t2, ratio=t1/t2), digits=3)
user.self sys.self elapsed user.child sys.child
t1 310.41 67 379.07 NA NA
t2 5.59 0 5.62 NA NA
ratio 55.53 Inf 67.45 NA NA
With bigger M the number of pairwise combinations increases and so does
the number of unique rows. This causes the time taken by f1 to really
increase, but f2 scales up rather slowly. The ratio above becomes really
better and better.
Hope this helps,
Rui Barradas
Em 28-06-2012 14:06, Weiser, Constantin escreveu:
> Hi, all together. I have - a maybe trivial - problem with aggregating a
> list of weights.
>
> Here is the problem:
> - At first I have set of nodes (X/Y-coordinates) and associated weights,
> where the set
> of nodes is typically not unique
> - I want to get a set of unique nodes and the sum of associated weights
>
> I am grateful for any help
>
>
> See for example:
>
> # weights:
> w <- c(1, 1, 1, 1, 1)
>
> # not unique set of nodes (X/Y-coordinates):
> nodes <- matrix(c(1,2,3,4,5,6,1,2,1,4),ncol=2, byrow=T)
>
>
> desired Result:
>
> #nodes
> [,1] [,2]
> [1,] 1 2
> [2,] 3 4
> [3,] 5 6
> [4,] 1 4
>
>
>
> #weights
> 2 1 1 1
>
>
>
> That is my solution, but it is very slow (typical size of nodes -->
> 200000x2):
>
> weights <- c(1, 1, 1, 1, 1)
> nodes <- matrix(c(1,2,3,4,5,6,1,2,1,4),ncol=2, byrow=T)
>
>
> ## to be replaced by a faster code
> drop.index <- duplicated(nodes)
> n.unique <- nodes[!drop.index, ]
> w.unique <- numeric(length(n.unique[,1]))
>
> lw <- length(weights)
> for (i in seq_along(w.unique)){
> index <- as.logical(2==rowSums(nodes==matrix(rep(n.unique[i,],lw),byrow
> = TRUE, nrow=lw)))
> w.unique[i] <- sum(weights[index])
> }
> ##
>
> n.unique
> w.unique
>
>
>
>
>
> ^
> | X
> | /
> | /eiser, Constantin
> | / Gutenberg University of Mainz, Germany
> | * /\ / Chair of Statistics & Econometrics
> | \ / \ / Jakob-Welder-Weg 4, 55128 Mainz
> | \/ \/ House of Law and Economics II, Room 00-116
> | Tel: 0049 6131 39 22715
> +--------------------------------------------------------->
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list