[Rd] suggestion how to use memcpy in duplicate.c

Thu Apr 22 13:42:54 CEST 2010

On Apr 22, 2010, at 7:12 AM, Matthew Dowle wrote:

> 
> Is this a thumbs up for memcpy for DUPLICATE_ATOMIC_VECTOR at least ?
> 
> If there is further specific testing then let me know, happy to help, but 
> you seem to have beaten me to it.
> 

I was not volunteering to do anything - I was just looking at whether it makes sense to bother at all and pointing out the bugs in your code ;). I have a sufficiently long list of TODOs already :P

Cheers,
Simon

> 
> "Simon Urbanek" <simon.urbanek at r-project.org> wrote in message 
> news:65D21B93-A737-4A94-BDF4-AD7E90518AC0 at r-project.org...
>> 
>> On Apr 21, 2010, at 2:15 PM, Seth Falcon wrote:
>> 
>>> On 4/21/10 10:45 AM, Simon Urbanek wrote:
>>>> Won't that miss the last incomplete chunk? (and please don't use
>>>> DATAPTR on INTSXP even though the effect is currently the same)
>>>> 
>>>> In general it seems that the it depends on nt whether this is
>>>> efficient or not since calls to short memcpy are expensive (very
>>>> small nt that is).
>>>> 
>>>> I ran some empirical tests to compare memcpy vs for() (x86_64, OS X)
>>>> and the results were encouraging - depending on the size of the
>>>> copied block the difference could be quite big: tiny block (ca. n =
>>>> 32 or less) - for() is faster small block (n ~ 1k) - memcpy is ca. 8x
>>>> faster as the size increases the gap closes (presumably due to RAM
>>>> bandwidth limitations) so for n = 512M it is ~30%.
>>>> 
>>> 
>>>> Of course this is contingent on the implementation of memcpy,
>>>> compiler, architecture etc. And will only matter if copying is what
>>>> you do most of the time ...
>>> 
>>> Copying of vectors is something that I would expect to happen fairly 
>>> often in many applications of R.
>>> 
>>> Is for() faster on small blocks by enough that one would want to branch 
>>> based on size?
>>> 
>> 
>> Good question. Given that the branching itself adds overhead possibly not. 
>> In the best case for() can be ~40% faster (for single-digit n) but that 
>> means billions of copies to make a difference (since the operation itself 
>> is so fast). The break-even point on my test machine is n=32 and when I 
>> added the branching it took 20% hit so I guess it's simply not worth it. 
>> The only case that may be worth branching is n:1 since that is likely a 
>> fairly common use (the branching penalty in copy routines is lower than 
>> comparing memcpy/for implementations since the branching can be done 
>> before the outer for loop so this may vary case-by-case).
>> 
>> Cheers,
>> Simon
>> 
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
>