[Rd] C function with unknown output length

Fri Jun 8 05:27:59 CEST 2007

Le 07-06-06 à 15:20, Herve Pages a écrit :

> Vincent Goulet wrote:
>> Hi all,
>>
>> Could anyone point me to one or more examples in the R sources of a C
>> function that is called without knowing in advance what will be the
>> length (say) of the output vector?
>>
>> To make myself clearer, we have a C function that computes
>> probabilities until their sum gets "close enough" to 1. Hence, the
>> number of probabilities is not known in advance.
>>
>
> Hi Vincent,
>
> Let's say you want to write a function get_matches(const char *  
> pattern, const char * x)
> that will find all the occurrences of string 'pattern' in string  
> 'x' and "return"
> their positions in the form of an array of integers.
> Of course you don't know in advance how many occurrences you're  
> going to find.
>
> One possible strategy is to:
>
>   - Add an extra arg to 'get_matches' for storing the positions and  
> make
>     'get_matches' return the number of matches (i.e. the length of  
> *pos):
>
>       int get_matches(int **pos_ptr, const char * pattern, const  
> char * x)
>
>     Note that pos_ptr is a pointer to an int pointer.
>
>   - In get_matches(): use a local array of ints and start with an  
> arbitrary
>     initial size for it:
>
>       int get_matches(...)
>       {
>         int *tmp_pos, tmp_size, npos = 0;
>
>         tmp_size = some initial guess of the number of matches
>         tmp_pos = (int *) S_alloc((long) tmp_size, sizeof(int));
>         ...
>
>     Then start searching for matches and every time you find one,  
> store its
>     position in tmp_pos[npos] and increase npos.
>     When tmp_pos is full (npos == tmp_size), realloc with:
>
>         ...
>         old_size = tmp_size;
>         tmp_size = 2 * old_size; /* there are many different  
> strategies for this */
>         tmp_pos = (int *) S_realloc((char *) tmp_pos, (long) tmp_size,
>                                     (long) old_tmp_size, sizeof(int));
>         ...
>
>     Note that there is no need to check that the call to S_alloc()  
> or S_realloc()
>     were successful because these functions will raise an error and  
> end the call
>     to .Call if they fail. In this case they will free the memory  
> currently allocated
>     (and so will do on any error or user interrupt).
>
>     When you are done, just return with:
>
>         ...
>         *pos_ptr = tmp_pos;
>         return npos;
>       }
>
>   - Call get_matches with:
>
>       int *pos, npos;
>
>       npos = get_matches(&pos, pattern, x);
>
>     Note that memory allocation took place in 'get_matches' but now  
> you need
>     to decide how and when the memory pointed by 'pos' will be freed.
>     In the R environment, this can be addressed by using  
> exclusively transient
>     storage allocation (http://cran.r-project.org/doc/manuals/R- 
> exts.html#Transient)
>     as we did in get_matches() so the allocated memory will be  
> automatically
>     reclaimed at the end of the call to .C or .Call.
>     Of course, the integers stored in pos have to be moved to a  
> "safe" place
>     before .Call returns. Typically this will be done with  
> something like:
>
>       SEXP Call_get_matches(...)
>       {
>         ...
>         npos = get_matches(&pos, pattern, x);
>         PROTECT(pos_sxp = NEW_INTEGER(npos));
>         memcpy(INTEGER(pos_sxp), pos, npos * sizeof(int));
>         UNPROTECT(1);
>         return pos_sxp; /* end of call to .Call */
>       }
>
> There are many variations around this. One of them is to "share"  
> pos and npos between
> get_matches and its caller by making them global variables (in this  
> case it is
> recommended to use 'static' in their declarations but this requires  
> that get_matches
> and its caller are in the same .c file).
>
> Hope this helps.

It did, thanks Herve. And thanks also to Dirk and Bill for their  
useful suggestions.

We (actually, my student, but in pure academia style I'll take part  
of the credit ;-) had done something very similar to Herve's  
suggestion, including the "double the size when it's full" strategy,  
but in one function only instead of two. Now I got confirmation it  
was a good way to go. I'm satisfied.

Best,    Vincent

>
> H.
>
>> I would like to have an idea what is the best way to handle this
>> situation in R.
>>
>> Thanks in advance!
>>
>> ---
>>    Vincent Goulet, Associate Professor
>>    École d'actuariat
>>    Université Laval, Québec
>>    Vincent.Goulet at act.ulaval.ca   http://vgoulet.act.ulaval.ca
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>