[Rd] What is the best way to loop over an ALTREP vector?

Gabriel Becker g@bembecker @end|ng |rom gm@||@com
Tue Sep 24 07:48:39 CEST 2019


Hi Bob,

Thanks for sending around the link to that. It looks mostly right and looks
like a useful onramp. There are a few things to watch out for though (I've
cc'ed Romain so he's aware of these comments). @romain I hope you taake the
following comments as they are intended, as help rather than attacks.

The largest issue I see is that the contract for Get_region is that it
*populates the
provided buffer with a copy of the data. *That buffer is expected to be
safe to destructively modify, shuffle, etc though I don't know if we are
actually doing that anywhere. As such, if I understand his C++ correctly,
that Get_region method  is not safe and shouldn't be used.

The other point is that Dataptr_or_null is not actually *guaranteed *not to
allocate. The default method returns NULL, but we have no way of preventing
an allocation in a user-defined method, and probably (?) no easy way of
detecting that it is occurring before it causes a bug. That said, Romain is
correct that when you are writing Dataptr_or_null methods you should write
them so that they don't allocate, generally. Basically your methods for
Dataptr_or_null shouldn't allocate, but you also should not write code that
relies on hard assumptions that no one's ever will.

Also, a small nitpick, R's internal mean function doesn't hit Dataptr, it
hits either INTEGER_ELT (which really should probably be a
ITERATE_BY_REGION) or ITERATE_BY_REGION.

Anyway, I hope that helps.
~G






On Mon, Sep 23, 2019 at 6:12 PM Bob Rudis <bob using rud.is> wrote:

> Not sure if you're using just C++ or Rcpp for C++ access but
> https://purrple.cat/blog/2018/10/14/altrep-and-cpp/ has some tips on
> using C++ w/ALTREP.
>
> > On Sep 23, 2019, at 3:17 PM, Wang Jiefei <szwjf08 using gmail.com> wrote:
> >
> > Sorry for post a lot of things, for the first part of code, I copied my
> C++
> > iter macro by mistake(and you can see an explicit type casting). Here is
> > the macro definition from R_exts/Itermacros.h
> >
> > #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype,     \
> >
> >                             strt, nfull, expr) do {         \
> >
> > *       const** etype *px = DATAPTR_OR_NULL(sx);           *
>  \
> >
> >       if (px != NULL) {                                      \
> >
> >           R_xlen_t __ibr_n__ = strt + nfull;                        \
> >
> >           R_xlen_t nb = __ibr_n__;                                  \
> >
> >           for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) {   \
> >
> >              expr                                            \
> >
> >            }                                                 \
> >
> >       }                                                      \
> >
> >       else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, vtype,
> > \
> >
> >                                  strt, nfull, expr);        \
> >
> >    } while (0)
> >
> >
> > Best,
> >
> > Jiefei
> >
> > On Mon, Sep 23, 2019 at 3:12 PM Wang Jiefei <szwjf08 using gmail.com> wrote:
> >
> >> Hi Gabriel,
> >>
> >> I have tried the macro and found a small issue, it seems like the macro
> is
> >> written in C and does an implicit type conversion(const void * to const
> int
> >> *), see below. While it is allowed in C, C++ seems not happy with it.
> Is it
> >> possible to add an explicit type casting so that it can be compatible
> with
> >> both language?
> >>
> >>
> >> #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype,     \
> >>
> >>                             strt, nfull, expr) do {         \
> >>
> >>       *const etype *px = (const** etype *)DATAPTR_OR_NULL(sx);  *
> >> \
> >>
> >>       if (px != NULL) {                                      \
> >>
> >>           R_xlen_t __ibr_n__ = strt + nfull;                        \
> >>
> >>           R_xlen_t nb = __ibr_n__;                                  \
> >>
> >>           for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) {   \
> >>
> >>              expr                                            \
> >>
> >>            }                                                 \
> >>
> >>       }                                                      \
> >>
> >>       else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype,
> >> vtype,       \
> >>
> >>                                   strt, nfull, expr);        \
> >>
> >>    } while (0)
> >>
> >>
> >>  Also, I notice that the element type(etype) and vector type(vtype) has
> >> to be specified in the macro. Since the SEXP is the first argument in
> the
> >> macro, it seems redundant to define etype and vtype for they have to
> match
> >> the type of the SEXP. I'm wondering if this is intentional? Will there
> be a
> >> type-free macro in R in the future? Here is a simple type-free macro I'm
> >> using.
> >>
> >> #define type_free_iter(sx, ptr, ind, nbatch,expr)\
> >>
> >> switch(TYPEOF(sx)){\
> >>
> >> case INTSXP:\
> >>
> >>       ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, INTEGER, expr);\
> >>
> >>       break; \
> >>
> >> case REALSXP:\
> >>
> >>       ITERATE_BY_REGION(sx, ptr, ind, nbatch, double, REAL, expr);\
> >>
> >>       break; \
> >>
> >> case LGLSXP:\
> >>
> >>       ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, LOGICAL, expr);\
> >>
> >>       break; \
> >>
> >> default:\
> >>
> >>       Rf_error("Unknow data type\n"); \
> >>
> >>       break; \
> >>
> >> }
> >>
> >>
> >>
> >> // [[Rcpp::export]]
> >>
> >> double sillysum(SEXP x) {
> >>
> >>       double s = 0.0;
> >>
> >>       type_free_iter(x, ptr, ind, nbatch,
> >>
> >>              {
> >>
> >>                     for (int i = 0; i < nbatch; i++) { s = s + ptr[i]; }
> >>
> >>              });
> >>
> >>              return s;
> >>
> >> }
> >>
> >>
> >>
> >>
> >> Best,
> >>
> >> Jiefei
> >>
> >> On Wed, Aug 28, 2019 at 2:32 PM Wang Jiefei <szwjf08 using gmail.com> wrote:
> >>
> >>> Thank you, Gabriel. The loop macro is very helpful. It is also exciting
> >>> to see that there are lots of changes in ALTREP in R devel version. I
> >>> really appreciate your help!
> >>>
> >>> Best,
> >>> Jiefei
> >>>
> >>> On Wed, Aug 28, 2019 at 7:37 AM Gabriel Becker <gabembecker using gmail.com>
> >>> wrote:
> >>>
> >>>> Jiefei,
> >>>>
> >>>> I've been meaning to write up something about this so hopefully this
> >>>> will be an impetus for me to actually do that, but until then,
> responses
> >>>> inline.
> >>>>
> >>>>
> >>>> On Tue, Aug 27, 2019, 7:22 PM Wang Jiefei <szwjf08 using gmail.com> wrote:
> >>>>
> >>>>> Hi devel team,
> >>>>>
> >>>>> I'm working on C/C++ level ALTREP compatibility for a package. The
> >>>>> package
> >>>>> previously used pointers to access the data of a SEXP, so it would
> not
> >>>>> work
> >>>>> for some ALTREP objects which do not have a pointer. I plan to
> rewrite
> >>>>> the
> >>>>> code and use functions like get_elt, get_region, and get_subset to
> >>>>> access
> >>>>> the values of a vector, so I have a few questions for ALTREP:
> >>>>>
> >>>>> 1. Since an ALTREP do not have to define all of the above
> >>>>> functions(element, region, subset), is there any way to check which
> >>>>> function has been defined for an ALTREP class? I did a search on
> >>>>> RInternal.h and altrep.c but did not find a solution for it. If not,
> >>>>> will
> >>>>> it be added in the future?
> >>>>>
> >>>>
> >>>> Element and region are guaranteed to always be defined and work (for
> >>>> altrep and non-altrep INTSXP, REALSXP, LGLSXPs, etc, we currently
> don't
> >>>> have region for STRSXP or VECSXP, I believe). If the altrep class
> does not
> >>>> provide them then default methods will be used, which may be
> inefficient in
> >>>> some cases but will work. Subset is currently a forward looking stub,
> but
> >>>> once implimented, that will also be guaranteed to work for all valid
> ALTREP
> >>>> classes.
> >>>>
> >>>>
> >>>>>
> >>>>> 2. Given the diversity of ALTREP classes, what is the best way to
> loop
> >>>>> over
> >>>>> an ALTREP object? I hope there can be an all-in-one function which
> can
> >>>>> get
> >>>>> the values from a vector as long as at least one of the above
> functions
> >>>>> has
> >>>>> been defined, so package developers would not be bothered by tons of
> >>>>> `if-else` statement if they want their package to work with ALTREP.
> >>>>> Since
> >>>>> it seems like there is no such function exist, what could be the best
> >>>>> way
> >>>>> to do the loop under the current R version?
> >>>>>
> >>>>
> >>>> The best way to loop over all SEXPs, which supports both altrep and
> >>>> nonaltrep objects is, with the ITERATE_BY_REGION (which has been in R
> for a
> >>>> number of released versions, at least since 3.5.0 I think) and the
> much
> >>>> newer (devel only) ITERATE_BY_REGION_PARTIAL macros defined in
> >>>> R_exts/Itermacros.h
> >>>>
> >>>> The meaning of the arguments is as follows for
> ITERATE_BY_REGION_PARTIAL
> >>>> are as follows (ITERATE_BY_REGION is the same except no strt, and
> nfull).
> >>>>
> >>>>
> >>>>   - sx - C level variable name of the SEXP to iterate over
> >>>>   - px - variable name to use for the pointer populated with data from
> >>>>   a region of sx
> >>>>   - idx - variable name to use for the "outer", batch counter in the
> >>>>   for loop. This will contain the 0-indexed start position of the
> batch
> >>>>   you're currently processing
> >>>>   - nb - variable name to use for the current batch size. This will
> >>>>   always either be GET_REGION_BUFFSIZE (512), or the number of
> elements
> >>>>   remaining in the vector, whichever is smaller
> >>>>   - etype - element (C) type, e.g., int, double, of the data
> >>>>   - vtype - vector (access API) type, e.g, INTEGER, REAL
> >>>>   - strt - the 0-indexed position in the vector to start iterating
> >>>>   - nfull - the total number oif elements to iterate over from the
> >>>>   vector
> >>>>   - expr - the code to process a single batch (Which will do things to
> >>>>   px, typically)
> >>>>
> >>>>
> >>>> So code to perform badly implemented not good idea summing of REALSXP
> >>>> data might look like
> >>>>
> >>>> double sillysum(SEXP x) {
> >>>>
> >>>>    double s = 0.0;
> >>>>
> >>>>    ITERATE_BY_REGION(x, ptr, ind, nbatch, double, REAL,
> >>>>        {
> >>>>
> >>>>            for(int i = 0; i < nbatch; i++) { s = s + ptr[i];}
> >>>>        })
> >>>>
> >>>>     return s;
> >>>> }
> >>>>
> >>>> For meatier examples of ITERATE_BY_REGION's use in practice you can
> grep
> >>>> the R sources. I know it is used in the implementations of the various
> >>>> C-level summaries (summary.c), print and formatting functions, and
> anyNA.
> >>>>
> >>>> Some things to remember
> >>>>
> >>>>   - If you have an inner loop like the one above, your total position
> >>>>   in the original vector is ind + i
> >>>>   - ITERATE_BY_REGION always processes the whole vector, if you need
> >>>>   to only do part of it yo'll either need custom breaking for both
> inner and
> >>>>   outer loopsl, or in R-devel you can use ITERATE_BY_REGION_PARTIAL
> >>>>   - Don't use the variants ending in 0, all they do is skip over
> >>>>   things that are a good idea in the case of non-altreps (and some
> very
> >>>>   specific altreps).
> >>>>
> >>>> Hope that helps.
> >>>>
> >>>> Best,
> >>>> ~G
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> Best,
> >>>>> Jiefei
> >>>>>
> >>>>>        [[alternative HTML version deleted]]
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-devel using r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>>>
> >>>>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list