[R] index values of one matrix to another of a different size

Sat Mar 10 18:26:18 CET 2012

Hi Ben,

It seems likely that there are bigger bottle necks in your overall
program/use---have you tried Rprof() to find where things really get
slowed down?  In any case, f2() below takes about 70% of the time as
your function in your test data, and 55-65% of the time for a bigger
example I constructed.  Rui's function benefits substantially from
byte compiling, but is still slower.  As a side benefit, f2() seems to
use less memory than your current implementation.

Cheers,

Josh

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
######sample data ##############
vals <- matrix(LETTERS[1:9], nrow = 3, ncol = 3,
  dimnames = list(c('row1','row2','row3'), c('col1','col2','col3')))

indx <- matrix(c(1,1,3,3,2,2,2,3,1,2,2,1), nrow=4, ncol=3)
storage.mode(indx) <- "integer"

f <- function(x, i, di = dim(i), dx = dim(x)) {
  out <- x[c(i + matrix(0:(dx[1L] - 1L) * dx[1L], nrow = di[1L], ncol
= di[2L], TRUE))]
  dim(out) <- di
  return(out)
}

fun <- function(valdata, inxdata){
        nr <- nrow(inxdata)
        nc <- ncol(inxdata)
        mat <- matrix(NA, nrow=nr*nc, ncol=2)
        i1 <- 1
        i2 <- nr
        for(j in 1:nc){
                mat[i1:i2, 1] <- inxdata[, j]
                mat[i1:i2, 2] <- rep(j, nr)
                i1 <- i1 + nr
                i2 <- i2 + nr
        }
        matrix(valdata[mat], ncol=nc)
}

require(compiler)
f2 <- cmpfun(f)
fun2 <- cmpfun(fun)

system.time(for (i in 1:10000) f(vals, indx))
system.time(for (i in 1:10000) f2(vals, indx))
system.time(for (i in 1:10000) fun(vals, indx))
system.time(for (i in 1:10000) fun2(vals, indx))
system.time(for (i in 1:10000)
matrix(vals[cbind(c(indx),rep(1:ncol(indx),each=nrow(indx)))],nrow=nrow(indx),ncol=ncol(indx)))

## now let's make a bigger test set
set.seed(1)
vals2 <- matrix(sample(LETTERS, 10^7, TRUE), nrow = 10^4)
indx2 <- sapply(1:ncol(vals2), FUN = function(x) sample(10^4, 10^3, TRUE))

dim(vals2)
dim(indx2)

## the best contenders from round 1
gold <- matrix(vals2[cbind(c(indx2),rep(1:ncol(indx2),each=nrow(indx2)))],nrow=nrow(indx2),ncol=ncol(indx2))
test1 <- f2(vals2, indx2)
all.equal(gold, test1)

system.time(for (i in 1:20) f2(vals2, indx2))
system.time(for (i in 1:20)
matrix(vals2[cbind(c(indx2),rep(1:ncol(indx2),each=nrow(indx2)))],nrow=nrow(indx2),ncol=ncol(indx2)))

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

On Sat, Mar 10, 2012 at 7:48 AM, Ben quant <ccquant at gmail.com> wrote:
> Thanks for the info. Unfortunately its a little bit slower after one apples
> to apples test using my big data. Mine: 0.28 seconds. Yours. 0.73 seconds.
> Not a big deal, but significant when I have to do this 300 to 500 times.
>
> regards,
>
> ben
>
> On Fri, Mar 9, 2012 at 1:23 PM, Rui Barradas <rui1174 at sapo.pt> wrote:
>
>> Hello,
>>
>> I don't know if it's the fastest but it's more natural to have an index
>> matrix with two columns only,
>> one for each coordinate. And it's fast.
>>
>> fun <- function(valdata, inxdata){
>>        nr <- nrow(inxdata)
>>        nc <- ncol(inxdata)
>>        mat <- matrix(NA, nrow=nr*nc, ncol=2)
>>        i1 <- 1
>>        i2 <- nr
>>        for(j in 1:nc){
>>                mat[i1:i2, 1] <- inxdata[, j]
>>                mat[i1:i2, 2] <- rep(j, nr)
>>                i1 <- i1 + nr
>>                i2 <- i2 + nr
>>        }
>>        matrix(valdata[mat], ncol=nc)
>> }
>>
>> fun(vals, indx)
>>
>> Rui Barradas
>>
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/Re-index-values-of-one-matrix-to-another-of-a-different-size-tp4458666p4460575.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/