[Rd] Speeding up transpose

Sat Aug 28 02:01:34 CEST 2010

Hi Radford,

On 08/25/2010 07:50 PM, Radford Neal wrote:
> I've looked at how to speed up the transpose function in R
> (ie, t(X)).
>
> The existing code does the work with loops like the following:
>
> 	for (i = 0; i<  len; i++)
> 	    REAL(r)[i] = REAL(a)[(i / ncol) + (i % ncol) * nrow];
>
> It seems a bit optimistic to expect a compiler to produce good code
> from this.  I've re-written these loops as follows:
>
>          for (i = 0, j = 0; i<len; i += 1, j += nrow) {
>              if (j>=len) j -= (len-1);
>              REAL(r)[i] = REAL(a)[j];
>          }

You can even avoid the j>=len test within the loop by using 2 nested
loops:

            dest = REAL(r);
            src = REAL(a);
            // walk the dest col by col (i.e. linearly)
            // walk the src row by row (i.e. not linearly)
            for (i = 0; i < nrow; i++, src++)
                for (j = 0; j < len; j += nrow, dest++)
                    *dest = src[j];

1 test, 3 additions (src[j] being equiv to *(src + j)), 1 assignment
in the inner loop. I wonder if it's possible to get rid of one of the
additions.

This gives me another 10% gain in speed (on my 64-bit Ubuntu laptop),
not that much, but still...

BTW isn't that surprising that using div() is actually slower than using
of / and %, at least on Linux? Since div() is kind of the atomic version
of it, I would have expected it to be slightly faster :-/

Cheers,
H.

>
> The resulting improvement is sometimes dramatic.  Here's a test
> program:
>
>          M<- matrix(seq(0,1,12000),200,60)
>
>          print(system.time({for (i in 1:10000) S<- t(M)}))
>          print(system.time({for (i in 1:10000) R<- t(S)}))
>
>          v<- seq(0,2,12000)
>
>          print(system.time({for (i in 1:100000) u<- t(v)}))
>          print(system.time({for (i in 1:100000) w<- t(u)}))
>
> Here are the times on an Intel Linux system:
>
>      R version 2.11.1:            Modified version:
>
>           user  system elapsed          user  system elapsed
>          1.190   0.040   1.232         0.610   0.010   0.619
>           user  system elapsed          user  system elapsed
>          1.200   0.020   1.226         0.610   0.000   0.616
>           user  system elapsed          user  system elapsed
>          0.800   0.010   0.813         0.750   0.000   0.752
>           user  system elapsed          user  system elapsed
>          0.910   0.010   0.921         0.860   0.000   0.864
>
> Here are the times on a SPARC Solaris system:
>
>      R version 2.11.1:            Modified version:
>
>           user  system elapsed          user  system elapsed
>         18.643   0.041  18.685         2.994   0.039   3.033
>           user  system elapsed          user  system elapsed
>         18.574   0.041  18.616         3.123   0.039   3.163
>           user  system elapsed          user  system elapsed
>          3.803   0.271   4.075         3.868   0.296   4.163
>           user  system elapsed          user  system elapsed
>          4.184   0.273   4.457         4.238   0.302   4.540
>
> So with the modification, transpose for a 60x200 or 200x60 matrix is
> about a factor of two faster on the Intel system, and a factor of six
> faster on the SPARC system.  There is little or no gain from the
> modification when transposing a row or column vector, however.  (I
> think it must be that on these machines multiplies and divides do not
> take constant time, but are faster when, for instance, dividing by 1.)
>
> I've appended below the new version of the modified part of the
> do_transpose function in src/main/array.c.
>
>      Radford Neal
>
> ----------------------------------------------------------------------
>
>      PROTECT(r = allocVector(TYPEOF(a), len));
>      switch (TYPEOF(a)) {
>      case LGLSXP:
>      case INTSXP:
>          for (i = 0, j = 0; i<len; i += 1, j += nrow) {
>              if (j>=len) j -= (len-1);
>              INTEGER(r)[i] = INTEGER(a)[j];
>          }
>      case REALSXP:
>          for (i = 0, j = 0; i<len; i += 1, j += nrow) {
>              if (j>=len) j -= (len-1);
>              REAL(r)[i] = REAL(a)[j];
>          }
>          break;
>      case CPLXSXP:
>          for (i = 0, j = 0; i<len; i += 1, j += nrow) {
>              if (j>=len) j -= (len-1);
>              COMPLEX(r)[i] = COMPLEX(a)[j];
>          }
>          break;
>      case STRSXP:
>          for (i = 0, j = 0; i<len; i += 1, j += nrow) {
>              if (j>=len) j -= (len-1);
>              SET_STRING_ELT(r, i, STRING_ELT(a,j));
>          }
>          break;
>      case VECSXP:
>          for (i = 0, j = 0; i<len; i += 1, j += nrow) {
>              if (j>=len) j -= (len-1);
>              SET_VECTOR_ELT(r, i, VECTOR_ELT(a,j));
>          }
>          break;
>      case RAWSXP:
>          for (i = 0, j = 0; i<len; i += 1, j += nrow) {
>              if (j>=len) j -= (len-1);
>              RAW(r)[i] = RAW(a)[j];
>          }
>          break;
>      default:
>          UNPROTECT(1);
>          goto not_matrix;
>      }
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319