[Rd] .C(..., DUP=FALSE) memory costs depending on input size?

Jeff Ryan jeff.a.ryan at gmail.com
Thu Nov 6 23:28:56 CET 2008


Marcel,

If you are writing the C code from scratch, take a look at either
.Call or .External, as both make no copies of the input objects, and
require no explicit conversion to the underlying storage type
(numeric/integer/etc) within the function call.

An even greater benefit is that you will also have access to the
actual R objects within C.

Jeff

On Thu, Nov 6, 2008 at 2:05 PM, MarcelK <m_kempenaar at planet.nl> wrote:
>
> Hello,
>
> I'm trying to create my own C code for use within R. While optimizing the
> code I've noticed that even while only using pointers to get my data to C
> the time needed still depends on data (vector) size.
>
> To test this, I've created an empty C function to which I've send vectors
> containing various sizes of elements. The time needed for each call is
> measured and plotted. I would expect a flat line (a little above y=0) since
> the only thing send are pointers. What I do not expect is to see a linear
> climbing line when the vector size increases. Initializing the vectors isn't
> being measured, only the '.C' call to an empty C function, see below.
>
> Is there anything I'm missing that can explain this input-size dependent
> latency? The only reason I can think of is that these vectors are being
> copied along the way.
>
> What follows is both the R and C code which I use only for testing and a
> plot of both measurements with DUP=TRUE and DUP=FALSE:
>
> (RED: DUP=FALSE, GREEN: DUP=TRUE)
> http://www.nabble.com/file/p20368695/CandR.png
>
>
> R code:
> ----------
> # sequence from 512 to 2^23 with 2^17 stepsize
> a <- seq(512, 2^23, 2^17)
> # storage for wall time
> h <- length(a); j <- length(a)
> for (i in 1:length(a)) {
>        x <- as.double(1:a[i])
>        y <- as.double(x)
>        # system.time()[3] is (actual) wall time
>        h[i] <- system.time(.C("commTest", x, y, DUP=FALSE))[3]
>        j[i] <- system.time(.C("commTest", x, y, DUP=TRUE))[3]
>        x <- 0
>        y <- 0
> }
> # plot:
> plot(a, h, type="l", col="red", xlab="Vector Size -->", ylab="Time in
> Seconds -->"); lines(a, j, col="green")
>
>
> C code:
> -----------
> #include<R.h>
> extern "C" {
>        void commTest(double* a, double* b);
> }
>
> /*
> * Empty function
> * Just testing communication costs between R --> C
> */
> void commTest(double* a, double* b) {
>  /* Do ab-so-lute-ly-nothing.. */
> }
>
> System Details:
> ---------------------
> Linux gpu 2.6.18-6-amd64 #1 SMP Thu May 8 06:49:39 UTC 2008 x86_64 GNU/Linux
> R version 2.7.1 (2008-06-23)
> --
> View this message in context: http://www.nabble.com/.C%28...%2C-DUP%3DFALSE%29-memory-costs-depending-on-input-size--tp20368695p20368695.html
> Sent from the R devel mailing list archive at Nabble.com.
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Jeffrey Ryan
jeffrey.ryan at insightalgo.com

ia: insight algorithmics
www.insightalgo.com



More information about the R-devel mailing list