[R] index values of one matrix to another of a different size
Joshua Wiley
jwiley.psych at gmail.com
Sat Mar 10 18:26:18 CET 2012
Hi Ben,
It seems likely that there are bigger bottle necks in your overall
program/use---have you tried Rprof() to find where things really get
slowed down? In any case, f2() below takes about 70% of the time as
your function in your test data, and 55-65% of the time for a bigger
example I constructed. Rui's function benefits substantially from
byte compiling, but is still slower. As a side benefit, f2() seems to
use less memory than your current implementation.
Cheers,
Josh
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
######sample data ##############
vals <- matrix(LETTERS[1:9], nrow = 3, ncol = 3,
dimnames = list(c('row1','row2','row3'), c('col1','col2','col3')))
indx <- matrix(c(1,1,3,3,2,2,2,3,1,2,2,1), nrow=4, ncol=3)
storage.mode(indx) <- "integer"
f <- function(x, i, di = dim(i), dx = dim(x)) {
out <- x[c(i + matrix(0:(dx[1L] - 1L) * dx[1L], nrow = di[1L], ncol
= di[2L], TRUE))]
dim(out) <- di
return(out)
}
fun <- function(valdata, inxdata){
nr <- nrow(inxdata)
nc <- ncol(inxdata)
mat <- matrix(NA, nrow=nr*nc, ncol=2)
i1 <- 1
i2 <- nr
for(j in 1:nc){
mat[i1:i2, 1] <- inxdata[, j]
mat[i1:i2, 2] <- rep(j, nr)
i1 <- i1 + nr
i2 <- i2 + nr
}
matrix(valdata[mat], ncol=nc)
}
require(compiler)
f2 <- cmpfun(f)
fun2 <- cmpfun(fun)
system.time(for (i in 1:10000) f(vals, indx))
system.time(for (i in 1:10000) f2(vals, indx))
system.time(for (i in 1:10000) fun(vals, indx))
system.time(for (i in 1:10000) fun2(vals, indx))
system.time(for (i in 1:10000)
matrix(vals[cbind(c(indx),rep(1:ncol(indx),each=nrow(indx)))],nrow=nrow(indx),ncol=ncol(indx)))
## now let's make a bigger test set
set.seed(1)
vals2 <- matrix(sample(LETTERS, 10^7, TRUE), nrow = 10^4)
indx2 <- sapply(1:ncol(vals2), FUN = function(x) sample(10^4, 10^3, TRUE))
dim(vals2)
dim(indx2)
## the best contenders from round 1
gold <- matrix(vals2[cbind(c(indx2),rep(1:ncol(indx2),each=nrow(indx2)))],nrow=nrow(indx2),ncol=ncol(indx2))
test1 <- f2(vals2, indx2)
all.equal(gold, test1)
system.time(for (i in 1:20) f2(vals2, indx2))
system.time(for (i in 1:20)
matrix(vals2[cbind(c(indx2),rep(1:ncol(indx2),each=nrow(indx2)))],nrow=nrow(indx2),ncol=ncol(indx2)))
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
On Sat, Mar 10, 2012 at 7:48 AM, Ben quant <ccquant at gmail.com> wrote:
> Thanks for the info. Unfortunately its a little bit slower after one apples
> to apples test using my big data. Mine: 0.28 seconds. Yours. 0.73 seconds.
> Not a big deal, but significant when I have to do this 300 to 500 times.
>
> regards,
>
> ben
>
> On Fri, Mar 9, 2012 at 1:23 PM, Rui Barradas <rui1174 at sapo.pt> wrote:
>
>> Hello,
>>
>> I don't know if it's the fastest but it's more natural to have an index
>> matrix with two columns only,
>> one for each coordinate. And it's fast.
>>
>> fun <- function(valdata, inxdata){
>> nr <- nrow(inxdata)
>> nc <- ncol(inxdata)
>> mat <- matrix(NA, nrow=nr*nc, ncol=2)
>> i1 <- 1
>> i2 <- nr
>> for(j in 1:nc){
>> mat[i1:i2, 1] <- inxdata[, j]
>> mat[i1:i2, 2] <- rep(j, nr)
>> i1 <- i1 + nr
>> i2 <- i2 + nr
>> }
>> matrix(valdata[mat], ncol=nc)
>> }
>>
>> fun(vals, indx)
>>
>> Rui Barradas
>>
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/Re-index-values-of-one-matrix-to-another-of-a-different-size-tp4458666p4460575.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/
More information about the R-help
mailing list