[Rd] [External] Re: ALTREP: Design concept of alternative string

介非王 @zwj|08 @end|ng |rom gm@||@com
Mon May 13 18:08:17 CEST 2019


Thank you very much for your explanation! I'm looking forward to seeing the
changes of R functions in a furture release.

Best,
Jiefei

Tierney, Luke <luke-tierney using uiowa.edu> 于2019年5月10日周五 下午12:22写道:

> On Fri, 10 May 2019, 介非王 wrote:
>
> > Hi Gabriel,
> >
> > Thanks for your explanation, I totally understand that it is almost
> > impossible to change the data structure of STRSXP. However, what I'm
> > proposing is not about changing the internal representation, but rather
> > about how we design and use the ALTREP API.
> >
> > I might do not state the workarounds clearly as English is not my first
> > language. Please let me explain them again in detail.
> >
> > 1. Update the existing R functions. When the ALTREP API Dataptr_or_null
> > returns NULL, use get_element instead(or as best as we can). I have seen
> > this pattern for some R functions, but somehow there are still some
> > functions left that do not follow this rule. For example, print function
> > will blindly call Dataptr (It even did not call Dataptr_or_null first)
> and
> > forces me to allocate a large chunk of memory in R. Updating these
> > functions would not completely solve the problem we are discussing but
> will
> > make it less serious.
>
> Fixing print() is pretty high priority (I thought we had done so for R
> 3.6.0 but apparently not). Others will come in over time; filing a
> request with bugzilla is one way to push up priority for a particular
> function or set of functions.
>
> Keep in mind that one option for your implementation is to signal an
> error if a data pointer is requested. You could make that dependent on
> some sort of option setting or make the error continuable by providing
> a restart.
>
> > 2. Update the ALTREP API, return a vector of const char *, and internally
> > wrap them as CHARSXP. This can be a way to "hack" the R data structure
> with
> > only a little cost to create the CHARSXP header.
>
> That doesn't seem feasible but I may not be understanding what you mean.
>
> > 3. Provide character ALTREP. Instead of using string ALTREP, we can
> define
> > an alternative CHARSXP. By doing it we will completely solve the problem
> > since the return value of the Dataptr of CHARSXP is a const char*. We do
> > not have to change any internal representation of characters, it just
> > requires a remap of the DATAPTR macro( or function?).
>
> Allowing ALTREP CHARSXP objects might be something to consider in the
> future, but the combination of caching and encoding issues make that
> very complex. I'm nat sure it would be a good idea or even
> feasible. In any case it won't happen anytime soon.
>
> Best,
>
> luke
>
> >
> > Again, I sincerely appreciate your time and the detailed you provided.
> I'm
> > looking forward to seeing any method to solve this problem in the current
> > and future R release.
> >
> > Best,
> > Jiefei
> >
> > Gabriel Becker <gabembecker using gmail.com> 于2019年5月9日周四 下午2:07写道:
> >
> >> Hi Jiefei,
> >>
> >> The issue here is that while the memory consequences of what you're
> >> describing may be true, this is simply how R handles character vector
> (what
> >> you're calling string) values internally. It doesn't actually have
> anything
> >> to do with ALTREP. Standard character vector SEXPs have an array of
> CHARSXP
> >> pointers in their payload (what is returned by DATAPTR) as well.
> >>
> >> As far as I know, this is important for string caching  and is actually
> >> intended to save memory when the same string value appears many times
> in an
> >> R session (and takes up more bytes than a pointer), though I haven't dug
> >> around R's low-level string handling a ton. Either way though, this
> would
> >> be a much much larger change than just changing the ALTREP API (which
> for
> >> things like this explicitly and intentionally matches how the C api
> behaves
> >> for non-ALTREP SEXPs for compatability).
> >>
> >> Likewise the reason that get_element is going to return a CHARSXP, is
> >> because that is what STRING_ELT(x, i) returns (equivalent to (SEXP)
> >> DATAPTR(x)[i] ), so I don't think that can be changed either.
> >>
> >> One other thing to note, though, is that if your'e asking for the
> dataptr
> >> (and it isn't read only) then you're basically stepping out of ALTREP
> space
> >> anyway, so it makes sense that a normally laid-out STRSXP (with it's
> >> CHARSXP payload).
> >>
> >> Best,
> >> ~G
> >>
> >> On Thu, May 9, 2019 at 8:09 AM 介非王 <szwjf08 using gmail.com> wrote:
> >>
> >>> Hello from Bioconductor,
> >>>
> >>> I'm developing a package to share R objects across clusters using boost
> >>> library. The concept is similar to mmap package:
> >>> https://cran.r-project.org/web/packages/mmap/index.html . However, I
> >>> have a
> >>> problem when I was trying to write Dataptr_method for the alternative
> >>> string.
> >>>
> >>> Based on my understanding, the return value of the Dataptr_method
> function
> >>> should be a vector of CHARSXP pointers. This design might be
> problematic
> >>> in
> >>> two ways:
> >>>
> >>> 1. The behavior of Dataptr_method function is inconsistent for string
> and
> >>> the other ALTREP types. For the other types we return a vector of pure
> >>> data
> >>> in memory allocated outside of R, but for the string, we return a
> vector
> >>> of
> >>> R objects allocated by R.
> >>>
> >>> 2. It causes an unnecessary duplication of the data. In order to return
> >>> CHARSXPs to R, It forces me to allocate CHARSXPs and copy the entire
> data
> >>> to the R process. By contrast, for the other ALTREP types, say
> altreal, I
> >>> can just return the pointer to R if the data is in the memory.
> >>>
> >>> The same problem occurs for Elt_method as well but is less serious
> since
> >>> only one CHARSXPs is allocated. Because my package is designed for
> sharing
> >>> a large R object. An allocation of the memory is undesired especially
> when
> >>> the data is read-only in the code(eg. print function). I'm not sure if
> >>> there are any solutions existed in the current R version, but I can
> >>> imagine
> >>> three workarounds:
> >>>
> >>> 1. Change the behavior of the R functions and use get_element function
> >>> instead of Dataptr function. This would make the problem more
> >>> memory-friendly but still cause the allocation.
> >>>
> >>> 2. Return a vector of const char* in Dataptr method. It would be very
> >>> efficient and consistent with the return values of the other ALTREP
> types.
> >>>
> >>> 3. Provide an alternative CHARSXP. This might be the best solution
> since
> >>> STRSXP behaves more like a list instead of a string, so an alternative
> >>> CHARSXP fits the concept of ALTREP better.
> >>>
> >>> Since I'm not an expert in R so I might post a solved problem. I would
> be
> >>> very happy and appreciate any suggestions regarding this problem.
> >>>
> >>> Best,
> >>> Jiefei
> >>>
> >>>         [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-devel using r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>
> >>
> >
> >
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>     Actuarial Science
> 241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



-- 
Jiefei Wang
Room 2-501,Tangxuan,QilinGarden,NanshanDistrict,Shenzhen
Guangdong,China
Phone (+86)18312589584
szwjf8 using gmail.com

	[[alternative HTML version deleted]]



More information about the R-devel mailing list