[Rd] suggestion how to use memcpy in duplicate.c
Matthew Dowle
mdowle at mdowle.plus.com
Wed Apr 21 17:54:01 CEST 2010
>From copyVector in duplicate.c :
void copyVector(SEXP s, SEXP t)
{
int i, ns, nt;
nt = LENGTH(t);
ns = LENGTH(s);
switch (TYPEOF(s)) {
...
case INTSXP:
for (i = 0; i < ns; i++)
INTEGER(s)[i] = INTEGER(t)[i % nt];
break;
...
could that be replaced with :
case INTSXP:
for (i=0; i<ns/nt; i++)
memcpy((char *)DATAPTR(s)+i*nt*sizeof(int), (char *)DATAPTR(t),
nt*sizeof(int));
break;
and similar for the other types in copyVector. This won't help regular
vector copies, since those seem to be done by the DUPLICATE_ATOMIC_VECTOR
macro, see next suggestion below, but it should help copyMatrix which calls
copyVector, scan.c which calls copyVector on three lines, dcf.c (once) and
dounzip.c (once).
For the DUPLICATE_ATOMIC_VECTOR macro there is already a comment next to it
:
<FIXME>: surely memcpy would be faster here?
which seems to refer to the for loop :
else { \
int __i__; \
type *__fp__ = fun(from), *__tp__ = fun(to); \
for (__i__ = 0; __i__ < __n__; __i__++) \
__tp__[__i__] = __fp__[__i__]; \
} \
Could that loop be replaced by the following ?
else { \
memcpy((char *)DATAPTR(to), (char *)DATAPTR(from), __n__*sizeof(type)); \
}\
In the data.table package, dogroups.c uses this technique, so the principle
is tested and works well so far.
Are there any road blocks preventing this change, or is anyone already
working on it ? If not then I'll try and test it (on Ubuntu 32bit) and
submit patch with timings, as before. Comments/pointers much appreciated.
Matthew
More information about the R-devel
mailing list