[Rd] Inconsistent results from .C()

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu May 23 08:08:17 CEST 2013


lev2 is malloc-ed in the code you supplied.  The new option Bill refers 
to is about input variables in .C, i.e. lev and not lev2.  Other array 
overruns can be found by other tools: Bill mentioned valgrind, and 
AddressSanitizer finds a different set of overruns.  See 
http://cran.r-project.org/doc/manuals/r-release/R-exts.html#Using-gctorture-and-memory-access 
.

On 23/05/2013 01:18, William Dunlap wrote:
> Had you tried using the new-to-3.0.0 options(CBoundCheck=TRUE)?
>
>      [from the NEWS file]
>      There is a new option, options(CBoundsCheck=), which controls how .C()
>      and .Fortran() pass arguments to compiled code. If true (which can be
>      enabled by setting the environment variable R_C_BOUNDS_CHECK to yes),
>      raw, integer, double and complex arguments are always copied, and checked
>      for writing off either end of the array on return from the compiled code
>      (when a second copy is made). This also checks individual elements of character
>      vectors passed to .C().
>
> valgrind helps find memory misuse also.
>
> When you write on memory that you didn't allocate, expect to get mystifying
> behavior.  Nonrepeatable results are a sign of memory misuse.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf
>> Of Robin Evans
>> Sent: Wednesday, May 22, 2013 4:16 PM
>> To: R Devel List
>> Subject: Re: [Rd] Inconsistent results from .C()
>>
>> Update: I did eventually discover an error in the C code which is probably
>> ultimately responsible for the bug (the variable eff_size is allowed to get
>> too large, and overrun the end of lev2[]).  However the behaviour of R (or
>> Rstudio) in response to this is still somewhat mystifying to me!
>>
>> Robin
>>
>> On 21 May 2013 14:10, Robin Evans <rje42 at cam.ac.uk> wrote:
>>
>>> Sure!  C code is below if it helps.  The gist is that the function
>>> oneMargin forms two matrices M and C, mostly by repeatedly taking
>>> Kronecker products.
>>>
>>> Robin
>>>
>>> void kronecker (int *A, int *B, int *dima, int *dimb, int *C) {
>>>    int k = 0;
>>>    int i1,i2,j1,j2;
>>>
>>>      for (i2 = 0; i2 < dima[1]; i2++) {
>>>      for (j2 = 0; j2 < dimb[1]; j2++) {
>>>      for (i1 = 0; i1 < dima[0]; i1++) {
>>>      for (j1 = 0; j1 < dimb[0]; j1++) {
>>>        C[k] = A[i1 + dima[0]*i2]*B[j1 + dimb[0]*j2];
>>>        k++;
>>>      }}}}
>>> }
>>>
>>> void iterate (int *x, int *lev) {
>>>    bool ok = FALSE;
>>>    int i=0;
>>>
>>>    do {
>>>      if(x[i] < lev[i]-1) {
>>>        x[i]++;
>>>        ok = TRUE;
>>>      }
>>>      else {
>>>        x[i] = 0;
>>>      }
>>>      i++;
>>>    }
>>>    while (!ok);
>>> }
>>>
>>> void oneMargin(int *Mar, int *Eff, int *neff, int *lev, int *nvar, int
>>> *M, int *C) {
>>>
>>>    int *M2, *mult, *Cj, *Cj2;
>>>    int i,j=1,k,l,m;
>>>
>>>    /* determine size of M to output */
>>>    for (i = 0; i < nvar[0]; i++) {
>>>      j *= (Mar[i] == 1 ? lev[i]*lev[i] : lev[i]);
>>>    }
>>>    M2 = malloc(sizeof(int)*j);
>>>    mult = malloc(sizeof(int)*j);
>>>
>>>    M[0] = 1;
>>>    int M_size[2] = {1,1}, mult_size[2], Cj_size[2], C_size[2] = {0,0};
>>>
>>>    for (i = nvar[0]-1; i >= 0; i--) {
>>>      if (Mar[i] == 1) {
>>>        for (j = 0; j < lev[i]*lev[i]; j++) {
>>>          mult[j] =  ((j % (lev[i]+1)) == 0) ? 1 : 0;
>>>        }
>>>        mult_size[0] = mult_size[1] = lev[i];
>>>      }
>>>      else {
>>>        for (j = 0; j < lev[i]; j++) {
>>>          mult[j] =  1;
>>>        }
>>>        mult_size[0] = 1;
>>>        mult_size[1] = lev[i];
>>>      }
>>>
>>>      kronecker(M, mult, M_size, mult_size, M2);
>>>      M_size[0] *= mult_size[0];
>>>      M_size[1] *= mult_size[1];
>>>
>>>      /* copy M2 over to M */
>>>      for (j=0; j < M_size[0]*M_size[1]; j++) {
>>>        M[j] = M2[j];
>>>      }
>>>    }
>>>
>>>    free(M2);
>>>
>>>    /* Create C matrix */
>>>    int index[nvar[0]];
>>>    int prod;
>>>    int eff_size = 0;
>>>    /*swapped = 0; */
>>>    mult_size[0] = 1;
>>>
>>>    C_size[0] = 1;
>>>    for (j = 0; j < nvar[0]; j++) {
>>>      C_size[0] *= (Mar[j] == 1 ? lev[j] : 1);
>>>    }
>>>
>>>    Cj = malloc(sizeof(int)*C_size[0]);
>>>    Cj2 = malloc(sizeof(int)*C_size[0]);
>>>    int *lev2;
>>>    lev2 = malloc(sizeof(int)*nvar[0]);
>>>
>>>    /* loop over effects */
>>>    for (i = 0; i < neff[0]; i++) {
>>>      prod = 1;
>>>      for (j = 0; j < nvar[0]; j++) {
>>>        if (Eff[j + i*nvar[0]]) {
>>>          prod *= lev[j]-1;
>>>          lev2[eff_size] = lev[j]-1;
>>>          index[eff_size] = 0;
>>>          eff_size++;
>>>         }
>>>      }
>>>
>>>      /* loop over states of effect (excluding corner points) */
>>>      for (j = 0; j < prod; j++) {
>>>        Cj[0] = 1;
>>>        Cj_size[0] = Cj_size[1] = 1;
>>>
>>>        k = eff_size;
>>>        /* loop over variables */
>>>        for (l = nvar[0]-1; l >= 0; l--) {
>>>          /* skip variables not in margin */
>>>          if (Mar[l] == 0) continue;
>>>          mult_size[1] = lev[l];
>>>
>>>          /* kronecker factor depends on whether or not variable is in
>>> effect */
>>>          if (Eff[i*nvar[0]+l] == 0) {
>>>            /* for variables not in effect, just repeat matrix */
>>>            for (m = 0; m < lev[l]; m++) {
>>>              mult[m] = 1;
>>>            }
>>>          }
>>>          else {
>>>            /* otherwise multiply based on state */
>>>            k--;
>>>            for (m = 0; m < lev[l]; m++) {
>>>              mult[m] = (m == index[k]) ? lev[l] - 1 : -1;
>>>            }
>>>          }
>>>          kronecker(Cj, mult, Cj_size, mult_size, Cj2);
>>>
>>>          Cj_size[1] *= mult_size[1];
>>>
>>>          /* copy Cj2 over to Cj */
>>>          for (k=0; k < Cj_size[0]*Cj_size[1]; k++) {
>>>            Cj[k] = Cj2[k];
>>>          }
>>>          if (Cj_size[0]*Cj_size[1] > C_size[0]) Rprintf("pointer screwup");
>>>        }
>>>
>>>        /* copy row over to C */
>>>        for (m = 0; m < C_size[0]; m++) C[C_size[0]*C_size[1]+m] = Cj[m];
>>>        C_size[1] += 1;
>>>        iterate(index, lev2);
>>>      }
>>>    }
>>>
>>>    free(lev2);
>>>    free(Cj);
>>>    free(Cj2);
>>>    free(mult);
>>> }
>>>
>>>
>>> On 21 May 2013 14:04, R. Michael Weylandt <michael.weylandt at gmail.com>
>>> wrote:
>>>>
>>>>   It might also help if you can point us to the C code to help debug.
>>>>
>>>> MW
>>>>
>>>> On Tue, May 21, 2013 at 10:53 AM, Robin Evans <rje42 at cam.ac.uk> wrote:
>>>>> I should add to this that I'm running on Scientific Linux 6.  I later
>>>>> noticed that the bug only seems to occur when I run the code from
>>> Rstudio,
>>>>> and not if I use the terminal directly, so this may be the key to the
>>>>> problem.
>>>>>
>>>>> Robin
>>>>>
>>>>> On 20 May 2013 16:12, Robin Evans <rje42 at cam.ac.uk> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I've run into a problem which is both maddening and rather hard to
>>>>>> replicate, therefore I wondered if someone might know of a plausible
>>>>>> explanation for it.  I couldn't find anything in the archives, though
>>>>>> maybe I'm searching for the wrong thing.
>>>>>>
>>>>>> I'm calling some C code using .C, and get the vector I'm interested in
>>>>>> back as the 7th location in the returned list.  However I find that if
>>>>>> I try to inspect the contents of this entry in the list in some ways,
>>>>>> I get one answer, and if I look at it in others I get a different
>>>>>> answer.  It's quite possible that there's something wrong with the C
>>>>>> code, but this doesn't seem to explain why this phenomenon would occur
>>>>>> in R.
>>>>>>
>>>>>> The problem does not always occur - I have to run the code a few times
>>>>>> and then call the console when it does, but the commands below show
>>>>>> what can happen when it does.  I apologise for not being able to get a
>>>>>> cleaner example.  Full code and output is below, but here is a
>>>>>> stylised version:
>>>>>>
>>>>>> The following all give one answer (which is the wrong answer as far as
>>>>>> I'm concerned) :
>>>>>>   * printing the whole list :
>>>>>>            .C(...)     # and looking at the 7th entry
>>>>>>   * applying c() to the 7th element of the list
>>>>>>            c(.C(...)[[7]])
>>>>>>   * assigning the 7th element to a vector:
>>>>>>            x = .C(...)[[7]];
>>>>>>            x
>>>>>>
>>>>>> these give a different answer (which is the answer I would hope the C
>>>>>> code returns):
>>>>>>   * using dput on the 7th entry:
>>>>>>            dput(.C(...)[[7]])
>>>>>>   * applying c() and then dput()
>>>>>>            dput(c(.C(...)[[7]]))
>>>>>>   * just printing the 7th entry of the list
>>>>>>            .C(...)[[7]]
>>>>>>
>>>>>> The answers are consistent in the sense that I always get the same
>>>>>> answers from running the same command in the console.  I have tried
>>>>>> inspecting the returned objects to see if the objects are somehow of a
>>>>>> different class than I expect, or are just being printed oddly, but
>>>>>> have not found anything untoward.
>>>>>>
>>>>>> Any suggestions would be much appreciated!
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Robin
>>>>>>
>>>>>>
>>>>>> # THESE COMMANDS GIVE ONE ANSWER
>>>>>> # [the correct answer always begins with 1, the incorrect with -1]
>>>>>>
>>>>>>> .C("oneMargin", c(1L,1L,1L,1L,1L), c(1L,1L,1L,1L,1L), 1L,
>>>>>> c(2L,2L,2L,2L,2L), 5L, ptr1, ptr2)[[7]]
>>>>>> [1]  1 -1 -1  1 -1  1  1 -1 -1  1  1 -1  1 -1 -1  1 -1  1  1 -1  1 -1
>>>>>> -1  1  1 -1 -1  1 -1  1  1 -1
>>>>>>
>>>>>>> dput(.C("oneMargin", c(1L,1L,1L,1L,1L), c(1L,1L,1L,1L,1L), 1L,
>>>>>> c(2L,2L,2L,2L,2L), 5L, ptr1, ptr2)[[7]])
>>>>>> c(1L, -1L, -1L, 1L, -1L, 1L, 1L, -1L, -1L, 1L, 1L, -1L, 1L, -1L,
>>>>>> -1L, 1L, -1L, 1L, 1L, -1L, 1L, -1L, -1L, 1L, 1L, -1L, -1L, 1L,
>>>>>> -1L, 1L, 1L, -1L)
>>>>>>
>>>>>>> x=dput(c(.C("oneMargin", c(1L,1L,1L,1L,1L), c(1L,1L,1L,1L,1L), 1L,
>>>>>> c(2L,2L,2L,2L,2L), 5L, ptr1, ptr2)[[7]]))
>>>>>> c(1L, -1L, -1L, 1L, -1L, 1L, 1L, -1L, -1L, 1L, 1L, -1L, 1L, -1L,
>>>>>>    -1L, 1L, -1L, 1L, 1L, -1L, 1L, -1L, -1L, 1L, 1L, -1L, -1L, 1L,
>>>>>>    -1L, 1L, 1L, -1L)
>>>>>>> x
>>>>>> [1]  1 -1 -1  1 -1  1  1 -1 -1  1  1 -1  1 -1 -1  1 -1  1  1 -1  1 -1
>>>>>> -1  1  1 -1 -1  1 -1  1  1 -1
>>>>>>
>>>>>> # THESE ALL GIVE A DIFFERENT ONE!
>>>>>>
>>>>>>> .C("oneMargin", c(1L,1L,1L,1L,1L), c(1L,1L,1L,1L,1L), 1L,
>>>>>> c(2L,2L,2L,2L,2L), 5L, ptr1, ptr2)
>>>>>>
>>>>>> # (OTHER ELEMENTS OF LIST REMOVED)
>>>>>> [[7]]
>>>>>> [1] -1 -1  1  1  1  1 -1 -1  1  1 -1 -1 -1 -1  1  1  1  1 -1 -1 -1 -1
>>>>>> 1  1 -1 -1  1  1  1  1 -1 -1
>>>>>>
>>>>>>> c(.C("oneMargin", c(1L,1L,1L,1L,1L), c(1L,1L,1L,1L,1L), 1L,
>>>>>> c(2L,2L,2L,2L,2L), 5L, ptr1, ptr2)[[7]])
>>>>>> [1] -1 -1  1  1  1  1 -1 -1  1  1 -1 -1 -1 -1  1  1  1  1 -1 -1 -1 -1
>>>>>> 1  1 -1 -1  1  1  1  1 -1 -1
>>>>>>> x = .C("oneMargin", c(1L,1L,1L,1L,1L), c(1L,1L,1L,1L,1L), 1L,
>>>>>> c(2L,2L,2L,2L,2L), 5L, ptr1, ptr2)[[7]]
>>>>>>> x
>>>>>> [1] -1 -1  1  1  1  1 -1 -1  1  1 -1 -1 -1 -1  1  1  1  1 -1 -1 -1 -1
>>>>>> 1  1 -1 -1  1  1  1  1 -1 -1
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Robin Evans
>>>>>> Statistical Laboratory
>>>>>> University of Cambridge
>>>>>> blog: itsastatlife.blogspot.com
>>>>>> web: www.statslab.cam.ac.uk/~rje42
>>>>>>
>>>>>> Causal Inference Workshop July 15th:
>>>>>> http://www.statslab.cam.ac.uk/~rje42/uai13/main.htm
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Robin Evans
>>>>> Statistical Laboratory
>>>>> University of Cambridge
>>>>> blog: itsastatlife.blogspot.com
>>>>> web: www.statslab.cam.ac.uk/~rje42 <
>>> http://www.stat.washington.edu/~rje42>
>>>>>
>>>>> Causal Inference Workshop July 15th:
>>>>> http://www.statslab.cam.ac.uk/~rje42/uai13/main.htm
>>>>>
>>>>>          [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list