[R] Help on improving an algorithm

li li hannah.hlx at gmail.com
Thu Jul 28 20:34:58 CEST 2016


Thanks very much Don! You alternative method does make it much faster.
However I think I made a mistake in my previous coding. Sorry.

Instead of checking whether each row sum is larger than 8, what I wanted is
to
check whether each column sum is lager than 8. See the code below.

So in this case, I should still try to use expand.grid? How to use
expand.grid if each element is a row instead of a single number or
character?

Thanks for you help.


> dim(tmp_a);dim(tmp_b); dim(tmp_c); dim(tmp_d)[1] 252   6
[1] 126   6
[1] 126   6
[1] 126   6


p <- 0
for (i in 1:dim(tmp_a)[1]){
    for (j in 1:dim(tmp_b)[1]){
        for (k in 1:dim(tmp_c)[1]){
            for (l in 1:dim(tmp_c)[1]){
            conti <- rbind(tmp_a[i,], tmp_b[j,], tmp_c[k,],tmp_d[l,])

* if (sum(apply(conti,2,sum)>8)==0)*                p <- p+1
                print(p)}}}}


2016-07-28 12:16 GMT-04:00 MacQueen, Don <macqueen1 at llnl.gov>:

> This is a good example to illustrate that R is a vectorized language, and
> programming concepts that would work in fortran, C, or any of many other
> languages are not always effective in R.
>
> The vectorized method below has produced the same value for 'p' in every
> example I¹ve tried, and is much faster.
>
> Note that the desired calculation looks at every combination of one row
> from each of four matrices, i.e., 126^4 combinations (a large number).
> This suggests the use of expand.grid().
>
> I don't see any need to actually construct the 'conti' matrix. Only its
> row sums are needed, and those can be calculated directly from the rows of
> the input matrices.
>
> The vectorized method makes using 126 rows tractable. It still takes a
> while on my machine, but it's tolerable. The original method takes longer
> than I'm willing to wait. I used a smaller number of rows for testing.
>
> ## number of rows in each input matrix
> nr <- 126
> nr <- 11
> ## number of columns in each input matrix
> nc <- 6
>
> ## simulate input data, in order to create
> ## a reproducible example
> xr <- 0:2
> tmpa <- matrix( sample(xr, nr*nc, replace=TRUE) , ncol=nc)
> tmpb <- matrix( sample(xr, nr*nc, replace=TRUE) , ncol=nc)
> tmpc <- matrix( sample(xr, nr*nc, replace=TRUE) , ncol=nc)
> tmpd <- matrix( sample(xr, nr*nc, replace=TRUE) , ncol=nc)
>
> cat('method 2 (vectorized)\n')
> t2 <- system.time({
>   rsa <- rowSums(tmpa)
>   rsb <- rowSums(tmpb)
>   rsc <- rowSums(tmpc)
>   rsd <- rowSums(tmpd)
>
>   bigr <- expand.grid(rsa, rsb, rsc, rsd)
>
>   p2 <- sum( rowSums( bigr > 8) == 0)
>   print(p2)
> })
>
> cat('original method (looping)\n')
>
> torig <- system.time({
>   p <- 0
>   for (i in 1:nrow(tmpa)) {
>     for (j in 1:nrow(tmpb)) {
>       for (k in 1:nrow(tmpc)) {
>         for (l in 1:nrow(tmpd)) {
>           conti <- rbind(tmpa[i,], tmpb[j,], tmpc[k,],tmpd[l,])
>           if (sum(apply(conti,1,sum)>8)==0) p <- p+1
>           ##        print(p)
>         }
>       }
>     }
>   }
>
> print(p)
> })
>
> cat('\n')
> print(t2)
> print(torig)
>
>
> A couple of minor points:
>   dim(tmp_a)[1] can be replaced by nrow(tmp_a)
>   in the original code, apply() can be replaced by rowSums()
> It might be faster if rowSums() was replaced by .rowSums(); I haven't
> tried that.
>
> -Don
>
>
> --
> Don MacQueen
>
> Lawrence Livermore National Laboratory
> 7000 East Ave., L-627
> Livermore, CA 94550
> 925-423-1062
>
>
>
>
>
> On 7/27/16, 1:41 PM, "R-help on behalf of li li"
> <r-help-bounces at r-project.org on behalf of hannah.hlx at gmail.com> wrote:
>
> >Hi all,
> > I have four matrix tmp_a, tmp_b, tmp_c and tmp_d whose dimensions are
> >shown as below.
> >I want to take one row from each of these matrices and then put the four
> >selected rows into a matrix.
> >I want to count the number of such matrices for which the vector of row
> >sum
> >is less than or equal to (8,8,8,8,8) (componentwise).
> >Below is the code I use now. Is there a way to make this more efficient
> >and
> >faster?
> > Thanks for the help in advance.
> >
> >> dim(tmp_a);dim(tmp_b); dim(tmp_c); dim(tmp_d)[1] 252   6
> >[1] 126   6
> >[1] 126   6
> >[1] 126   6
> >
> >
> >p <- 0
> >for (i in 1:dim(tmp_a)[1]){
> >    for (j in 1:dim(tmp_b)[1]){
> >        for (k in 1:dim(tmp_c)[1]){
> >            for (l in 1:dim(tmp_c)[1]){
> >            conti <- rbind(tmp_a[i,], tmp_b[j,], tmp_c[k,],tmp_d[l,])
> >            if (sum(apply(conti,1,sum)>8)==0)
> >                p <- p+1
> >                print(p)}}}}
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list