[Rd] suggestion how to use memcpy in duplicate.c

Wed Apr 21 19:45:37 CEST 2010

Matt,

On Apr 21, 2010, at 11:54 AM, Matthew Dowle wrote:

>> From copyVector in duplicate.c :
> 
> void copyVector(SEXP s, SEXP t)
> {
>    int i, ns, nt;
>    nt = LENGTH(t);
>    ns = LENGTH(s);
>    switch (TYPEOF(s)) {
> ...
>    case INTSXP:
>    for (i = 0; i < ns; i++)
>        INTEGER(s)[i] = INTEGER(t)[i % nt];
>    break;
> ...
> 
> could that be replaced with :
> 
>    case INTSXP:
>    for (i=0; i<ns/nt; i++)
>        memcpy((char *)DATAPTR(s)+i*nt*sizeof(int), (char *)DATAPTR(t), 
> nt*sizeof(int));
>    break;
> 

Won't that miss the last incomplete chunk? (and please don't use DATAPTR on INTSXP even though the effect is currently the same)

In general it seems that the it depends on nt whether this is efficient or not since calls to short memcpy are expensive (very small nt that is).

 I ran some empirical tests to compare memcpy vs for() (x86_64, OS X) and the results were encouraging - depending on the size of the copied block the difference could be quite big:
tiny block (ca. n = 32 or less) - for() is faster
small block (n ~ 1k) - memcpy is ca. 8x faster
as the size increases the gap closes (presumably due to RAM bandwidth limitations) so for n = 512M it is ~30%.

Of course this is contingent on the implementation of memcpy, compiler, architecture etc. And will only matter if copying is what you do most of the time ...

Cheers,
Simon

> and similar for the other types in copyVector.  This won't help regular 
> vector copies, since those seem to be done by the DUPLICATE_ATOMIC_VECTOR 
> macro, see next suggestion below, but it should help copyMatrix which calls 
> copyVector, scan.c which calls copyVector on three lines, dcf.c (once) and 
> dounzip.c (once).
> 
> For the DUPLICATE_ATOMIC_VECTOR macro there is already a comment next to it 
> :
> 
>    <FIXME>: surely memcpy would be faster here?
> 
> which seems to refer to the for loop  :
> 
>    else { \
>    int __i__; \
>    type *__fp__ = fun(from), *__tp__ = fun(to); \
>    for (__i__ = 0; __i__ < __n__; __i__++) \
>      __tp__[__i__] = __fp__[__i__]; \
>  } \
> 
> Could that loop be replaced by the following ?
> 
>   else { \
>   memcpy((char *)DATAPTR(to), (char *)DATAPTR(from), __n__*sizeof(type)); \
>   }\
> 
> In the data.table package, dogroups.c uses this technique, so the principle 
> is tested and works well so far.
> 
> Are there any road blocks preventing this change, or is anyone already 
> working on it ?  If not then I'll try and test it (on Ubuntu 32bit) and 
> submit patch with timings, as before.  Comments/pointers much appreciated.
> 
> Matthew
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
>