[Rd] What is the best way to loop over an ALTREP vector?
Romain Francois
rom@|n @end|ng |rom r@tud|o@com
Tue Sep 24 10:39:13 CEST 2019
Thanks for these comments. I should alter the blog post or write some follow up.
This was a weekend blog post that only benefited from a short time of research research. I’m glad people find it useful, but I’m sure a detailed documentation of the features from the authors would be more useful.
Romain
> Le 24 sept. 2019 à 07:48, Gabriel Becker <gabembecker using gmail.com> a écrit :
>
> Hi Bob,
>
> Thanks for sending around the link to that. It looks mostly right and looks
> like a useful onramp. There are a few things to watch out for though (I've
> cc'ed Romain so he's aware of these comments). @romain I hope you taake the
> following comments as they are intended, as help rather than attacks.
>
> The largest issue I see is that the contract for Get_region is that it
> *populates the
> provided buffer with a copy of the data. *That buffer is expected to be
> safe to destructively modify, shuffle, etc though I don't know if we are
> actually doing that anywhere. As such, if I understand his C++ correctly,
> that Get_region method is not safe and shouldn't be used.
>
> The other point is that Dataptr_or_null is not actually *guaranteed *not to
> allocate. The default method returns NULL, but we have no way of preventing
> an allocation in a user-defined method, and probably (?) no easy way of
> detecting that it is occurring before it causes a bug. That said, Romain is
> correct that when you are writing Dataptr_or_null methods you should write
> them so that they don't allocate, generally. Basically your methods for
> Dataptr_or_null shouldn't allocate, but you also should not write code that
> relies on hard assumptions that no one's ever will.
>
> Also, a small nitpick, R's internal mean function doesn't hit Dataptr, it
> hits either INTEGER_ELT (which really should probably be a
> ITERATE_BY_REGION) or ITERATE_BY_REGION.
>
> Anyway, I hope that helps.
> ~G
>
>
>
>
>
>
> On Mon, Sep 23, 2019 at 6:12 PM Bob Rudis <bob using rud.is <mailto:bob using rud.is>> wrote:
>
>> Not sure if you're using just C++ or Rcpp for C++ access but
>> https://purrple.cat/blog/2018/10/14/altrep-and-cpp/ has some tips on
>> using C++ w/ALTREP.
>>
>>> On Sep 23, 2019, at 3:17 PM, Wang Jiefei <szwjf08 using gmail.com> wrote:
>>>
>>> Sorry for post a lot of things, for the first part of code, I copied my
>> C++
>>> iter macro by mistake(and you can see an explicit type casting). Here is
>>> the macro definition from R_exts/Itermacros.h
>>>
>>> #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype, \
>>>
>>> strt, nfull, expr) do { \
>>>
>>> * const** etype *px = DATAPTR_OR_NULL(sx); *
>> \
>>>
>>> if (px != NULL) { \
>>>
>>> R_xlen_t __ibr_n__ = strt + nfull; \
>>>
>>> R_xlen_t nb = __ibr_n__; \
>>>
>>> for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) { \
>>>
>>> expr \
>>>
>>> } \
>>>
>>> } \
>>>
>>> else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, vtype,
>>> \
>>>
>>> strt, nfull, expr); \
>>>
>>> } while (0)
>>>
>>>
>>> Best,
>>>
>>> Jiefei
>>>
>>> On Mon, Sep 23, 2019 at 3:12 PM Wang Jiefei <szwjf08 using gmail.com> wrote:
>>>
>>>> Hi Gabriel,
>>>>
>>>> I have tried the macro and found a small issue, it seems like the macro
>> is
>>>> written in C and does an implicit type conversion(const void * to const
>> int
>>>> *), see below. While it is allowed in C, C++ seems not happy with it.
>> Is it
>>>> possible to add an explicit type casting so that it can be compatible
>> with
>>>> both language?
>>>>
>>>>
>>>> #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype, \
>>>>
>>>> strt, nfull, expr) do { \
>>>>
>>>> *const etype *px = (const** etype *)DATAPTR_OR_NULL(sx); *
>>>> \
>>>>
>>>> if (px != NULL) { \
>>>>
>>>> R_xlen_t __ibr_n__ = strt + nfull; \
>>>>
>>>> R_xlen_t nb = __ibr_n__; \
>>>>
>>>> for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) { \
>>>>
>>>> expr \
>>>>
>>>> } \
>>>>
>>>> } \
>>>>
>>>> else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype,
>>>> vtype, \
>>>>
>>>> strt, nfull, expr); \
>>>>
>>>> } while (0)
>>>>
>>>>
>>>> Also, I notice that the element type(etype) and vector type(vtype) has
>>>> to be specified in the macro. Since the SEXP is the first argument in
>> the
>>>> macro, it seems redundant to define etype and vtype for they have to
>> match
>>>> the type of the SEXP. I'm wondering if this is intentional? Will there
>> be a
>>>> type-free macro in R in the future? Here is a simple type-free macro I'm
>>>> using.
>>>>
>>>> #define type_free_iter(sx, ptr, ind, nbatch,expr)\
>>>>
>>>> switch(TYPEOF(sx)){\
>>>>
>>>> case INTSXP:\
>>>>
>>>> ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, INTEGER, expr);\
>>>>
>>>> break; \
>>>>
>>>> case REALSXP:\
>>>>
>>>> ITERATE_BY_REGION(sx, ptr, ind, nbatch, double, REAL, expr);\
>>>>
>>>> break; \
>>>>
>>>> case LGLSXP:\
>>>>
>>>> ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, LOGICAL, expr);\
>>>>
>>>> break; \
>>>>
>>>> default:\
>>>>
>>>> Rf_error("Unknow data type\n"); \
>>>>
>>>> break; \
>>>>
>>>> }
>>>>
>>>>
>>>>
>>>> // [[Rcpp::export]]
>>>>
>>>> double sillysum(SEXP x) {
>>>>
>>>> double s = 0.0;
>>>>
>>>> type_free_iter(x, ptr, ind, nbatch,
>>>>
>>>> {
>>>>
>>>> for (int i = 0; i < nbatch; i++) { s = s + ptr[i]; }
>>>>
>>>> });
>>>>
>>>> return s;
>>>>
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>> Best,
>>>>
>>>> Jiefei
>>>>
>>>> On Wed, Aug 28, 2019 at 2:32 PM Wang Jiefei <szwjf08 using gmail.com> wrote:
>>>>
>>>>> Thank you, Gabriel. The loop macro is very helpful. It is also exciting
>>>>> to see that there are lots of changes in ALTREP in R devel version. I
>>>>> really appreciate your help!
>>>>>
>>>>> Best,
>>>>> Jiefei
>>>>>
>>>>> On Wed, Aug 28, 2019 at 7:37 AM Gabriel Becker <gabembecker using gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Jiefei,
>>>>>>
>>>>>> I've been meaning to write up something about this so hopefully this
>>>>>> will be an impetus for me to actually do that, but until then,
>> responses
>>>>>> inline.
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 27, 2019, 7:22 PM Wang Jiefei <szwjf08 using gmail.com> wrote:
>>>>>>
>>>>>>> Hi devel team,
>>>>>>>
>>>>>>> I'm working on C/C++ level ALTREP compatibility for a package. The
>>>>>>> package
>>>>>>> previously used pointers to access the data of a SEXP, so it would
>> not
>>>>>>> work
>>>>>>> for some ALTREP objects which do not have a pointer. I plan to
>> rewrite
>>>>>>> the
>>>>>>> code and use functions like get_elt, get_region, and get_subset to
>>>>>>> access
>>>>>>> the values of a vector, so I have a few questions for ALTREP:
>>>>>>>
>>>>>>> 1. Since an ALTREP do not have to define all of the above
>>>>>>> functions(element, region, subset), is there any way to check which
>>>>>>> function has been defined for an ALTREP class? I did a search on
>>>>>>> RInternal.h and altrep.c but did not find a solution for it. If not,
>>>>>>> will
>>>>>>> it be added in the future?
>>>>>>>
>>>>>>
>>>>>> Element and region are guaranteed to always be defined and work (for
>>>>>> altrep and non-altrep INTSXP, REALSXP, LGLSXPs, etc, we currently
>> don't
>>>>>> have region for STRSXP or VECSXP, I believe). If the altrep class
>> does not
>>>>>> provide them then default methods will be used, which may be
>> inefficient in
>>>>>> some cases but will work. Subset is currently a forward looking stub,
>> but
>>>>>> once implimented, that will also be guaranteed to work for all valid
>> ALTREP
>>>>>> classes.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> 2. Given the diversity of ALTREP classes, what is the best way to
>> loop
>>>>>>> over
>>>>>>> an ALTREP object? I hope there can be an all-in-one function which
>> can
>>>>>>> get
>>>>>>> the values from a vector as long as at least one of the above
>> functions
>>>>>>> has
>>>>>>> been defined, so package developers would not be bothered by tons of
>>>>>>> `if-else` statement if they want their package to work with ALTREP.
>>>>>>> Since
>>>>>>> it seems like there is no such function exist, what could be the best
>>>>>>> way
>>>>>>> to do the loop under the current R version?
>>>>>>>
>>>>>>
>>>>>> The best way to loop over all SEXPs, which supports both altrep and
>>>>>> nonaltrep objects is, with the ITERATE_BY_REGION (which has been in R
>> for a
>>>>>> number of released versions, at least since 3.5.0 I think) and the
>> much
>>>>>> newer (devel only) ITERATE_BY_REGION_PARTIAL macros defined in
>>>>>> R_exts/Itermacros.h
>>>>>>
>>>>>> The meaning of the arguments is as follows for
>> ITERATE_BY_REGION_PARTIAL
>>>>>> are as follows (ITERATE_BY_REGION is the same except no strt, and
>> nfull).
>>>>>>
>>>>>>
>>>>>> - sx - C level variable name of the SEXP to iterate over
>>>>>> - px - variable name to use for the pointer populated with data from
>>>>>> a region of sx
>>>>>> - idx - variable name to use for the "outer", batch counter in the
>>>>>> for loop. This will contain the 0-indexed start position of the
>> batch
>>>>>> you're currently processing
>>>>>> - nb - variable name to use for the current batch size. This will
>>>>>> always either be GET_REGION_BUFFSIZE (512), or the number of
>> elements
>>>>>> remaining in the vector, whichever is smaller
>>>>>> - etype - element (C) type, e.g., int, double, of the data
>>>>>> - vtype - vector (access API) type, e.g, INTEGER, REAL
>>>>>> - strt - the 0-indexed position in the vector to start iterating
>>>>>> - nfull - the total number oif elements to iterate over from the
>>>>>> vector
>>>>>> - expr - the code to process a single batch (Which will do things to
>>>>>> px, typically)
>>>>>>
>>>>>>
>>>>>> So code to perform badly implemented not good idea summing of REALSXP
>>>>>> data might look like
>>>>>>
>>>>>> double sillysum(SEXP x) {
>>>>>>
>>>>>> double s = 0.0;
>>>>>>
>>>>>> ITERATE_BY_REGION(x, ptr, ind, nbatch, double, REAL,
>>>>>> {
>>>>>>
>>>>>> for(int i = 0; i < nbatch; i++) { s = s + ptr[i];}
>>>>>> })
>>>>>>
>>>>>> return s;
>>>>>> }
>>>>>>
>>>>>> For meatier examples of ITERATE_BY_REGION's use in practice you can
>> grep
>>>>>> the R sources. I know it is used in the implementations of the various
>>>>>> C-level summaries (summary.c), print and formatting functions, and
>> anyNA.
>>>>>>
>>>>>> Some things to remember
>>>>>>
>>>>>> - If you have an inner loop like the one above, your total position
>>>>>> in the original vector is ind + i
>>>>>> - ITERATE_BY_REGION always processes the whole vector, if you need
>>>>>> to only do part of it yo'll either need custom breaking for both
>> inner and
>>>>>> outer loopsl, or in R-devel you can use ITERATE_BY_REGION_PARTIAL
>>>>>> - Don't use the variants ending in 0, all they do is skip over
>>>>>> things that are a good idea in the case of non-altreps (and some
>> very
>>>>>> specific altreps).
>>>>>>
>>>>>> Hope that helps.
>>>>>>
>>>>>> Best,
>>>>>> ~G
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Best,
>>>>>>> Jiefei
>>>>>>>
>>>>>>> [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-devel using r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>>
>>>>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org <mailto:R-devel using r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel <https://stat.ethz.ch/mailman/listinfo/r-devel>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list