[R] index values of one matrix to another of a different size

Joshua Wiley jwiley.psych at gmail.com
Sat Mar 10 21:21:12 CET 2012


On Sat, Mar 10, 2012 at 12:11 PM, Ben quant <ccquant at gmail.com> wrote:
> Very interesting. You are doing some stuff here that I have never seen.

and that I would not typically do or recommend (e.g., fussing with
storage mode or manually setting the dimensions of an object), but
that can be faster by sacrificing higher level functions flexibility
for lower level, more direct control.

> Thank you. I will test it on my real data on Monday and let you know what I
> find. That cmpfun function looks very useful!

It can reduce the overhead of repeated function calls.  I find the
biggest speedups when it is used with some sort of loop.  Then again,
many loops can be avoided entirely, which often yields even larger
performance gains.

>
> Thanks,

You're welcome.  You might also look at the data table package by
Matthew Dowle.  It does some *very* fast indexing and subsetting and
if those operations are serious slow down for you, you would likely
benefit substantially from using it.  One final comment, since you are
creating the matrix of indices; if you can create it in such a way
that it already has the vector position rather than row/column form,
you could eliminate the need for my f2() function altogether as you
could use it to directly index your data, and then just add dimensions
back afterward.

Cheers,

Josh

> Ben
>
>
> On Sat, Mar 10, 2012 at 10:26 AM, Joshua Wiley <jwiley.psych at gmail.com>
> wrote:
>>
>> Hi Ben,
>>
>> It seems likely that there are bigger bottle necks in your overall
>> program/use---have you tried Rprof() to find where things really get
>> slowed down?  In any case, f2() below takes about 70% of the time as
>> your function in your test data, and 55-65% of the time for a bigger
>> example I constructed.  Rui's function benefits substantially from
>> byte compiling, but is still slower.  As a side benefit, f2() seems to
>> use less memory than your current implementation.
>>
>> Cheers,
>>
>> Josh
>>
>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>> ######sample data ##############
>> vals <- matrix(LETTERS[1:9], nrow = 3, ncol = 3,
>>  dimnames = list(c('row1','row2','row3'), c('col1','col2','col3')))
>>
>> indx <- matrix(c(1,1,3,3,2,2,2,3,1,2,2,1), nrow=4, ncol=3)
>> storage.mode(indx) <- "integer"
>>
>>
>> f <- function(x, i, di = dim(i), dx = dim(x)) {
>>  out <- x[c(i + matrix(0:(dx[1L] - 1L) * dx[1L], nrow = di[1L], ncol
>> = di[2L], TRUE))]
>>  dim(out) <- di
>>  return(out)
>> }
>>
>>
>> fun <- function(valdata, inxdata){
>>        nr <- nrow(inxdata)
>>        nc <- ncol(inxdata)
>>        mat <- matrix(NA, nrow=nr*nc, ncol=2)
>>        i1 <- 1
>>        i2 <- nr
>>        for(j in 1:nc){
>>                mat[i1:i2, 1] <- inxdata[, j]
>>                mat[i1:i2, 2] <- rep(j, nr)
>>                i1 <- i1 + nr
>>                i2 <- i2 + nr
>>        }
>>        matrix(valdata[mat], ncol=nc)
>> }
>>
>> require(compiler)
>> f2 <- cmpfun(f)
>> fun2 <- cmpfun(fun)
>>
>> system.time(for (i in 1:10000) f(vals, indx))
>> system.time(for (i in 1:10000) f2(vals, indx))
>> system.time(for (i in 1:10000) fun(vals, indx))
>> system.time(for (i in 1:10000) fun2(vals, indx))
>> system.time(for (i in 1:10000)
>>
>> matrix(vals[cbind(c(indx),rep(1:ncol(indx),each=nrow(indx)))],nrow=nrow(indx),ncol=ncol(indx)))
>>
>> ## now let's make a bigger test set
>> set.seed(1)
>> vals2 <- matrix(sample(LETTERS, 10^7, TRUE), nrow = 10^4)
>> indx2 <- sapply(1:ncol(vals2), FUN = function(x) sample(10^4, 10^3, TRUE))
>>
>> dim(vals2)
>> dim(indx2)
>>
>> ## the best contenders from round 1
>> gold <-
>> matrix(vals2[cbind(c(indx2),rep(1:ncol(indx2),each=nrow(indx2)))],nrow=nrow(indx2),ncol=ncol(indx2))
>> test1 <- f2(vals2, indx2)
>> all.equal(gold, test1)
>>
>> system.time(for (i in 1:20) f2(vals2, indx2))
>> system.time(for (i in 1:20)
>>
>> matrix(vals2[cbind(c(indx2),rep(1:ncol(indx2),each=nrow(indx2)))],nrow=nrow(indx2),ncol=ncol(indx2)))
>>
>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>
>> On Sat, Mar 10, 2012 at 7:48 AM, Ben quant <ccquant at gmail.com> wrote:
>> > Thanks for the info. Unfortunately its a little bit slower after one
>> > apples
>> > to apples test using my big data. Mine: 0.28 seconds. Yours. 0.73
>> > seconds.
>> > Not a big deal, but significant when I have to do this 300 to 500 times.
>> >
>> > regards,
>> >
>> > ben
>> >
>> > On Fri, Mar 9, 2012 at 1:23 PM, Rui Barradas <rui1174 at sapo.pt> wrote:
>> >
>> >> Hello,
>> >>
>> >> I don't know if it's the fastest but it's more natural to have an index
>> >> matrix with two columns only,
>> >> one for each coordinate. And it's fast.
>> >>
>> >> fun <- function(valdata, inxdata){
>> >>        nr <- nrow(inxdata)
>> >>        nc <- ncol(inxdata)
>> >>        mat <- matrix(NA, nrow=nr*nc, ncol=2)
>> >>        i1 <- 1
>> >>        i2 <- nr
>> >>        for(j in 1:nc){
>> >>                mat[i1:i2, 1] <- inxdata[, j]
>> >>                mat[i1:i2, 2] <- rep(j, nr)
>> >>                i1 <- i1 + nr
>> >>                i2 <- i2 + nr
>> >>        }
>> >>        matrix(valdata[mat], ncol=nc)
>> >> }
>> >>
>> >> fun(vals, indx)
>> >>
>> >> Rui Barradas
>> >>
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> >> http://r.789695.n4.nabble.com/Re-index-values-of-one-matrix-to-another-of-a-different-size-tp4458666p4460575.html
>> >> Sent from the R help mailing list archive at Nabble.com.
>> >>
>> >> ______________________________________________
>> >> R-help at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> Programmer Analyst II, Statistical Consulting Group
>> University of California, Los Angeles
>> https://joshuawiley.com/
>
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/



More information about the R-help mailing list