[Rd] ALTREP: Design concept of alternative string

Gabriel Becker g@bembecker @end|ng |rom gm@||@com
Thu May 9 20:07:30 CEST 2019


Hi Jiefei,

The issue here is that while the memory consequences of what you're
describing may be true, this is simply how R handles character vector (what
you're calling string) values internally. It doesn't actually have anything
to do with ALTREP. Standard character vector SEXPs have an array of CHARSXP
pointers in their payload (what is returned by DATAPTR) as well.

As far as I know, this is important for string caching  and is actually
intended to save memory when the same string value appears many times in an
R session (and takes up more bytes than a pointer), though I haven't dug
around R's low-level string handling a ton. Either way though, this would
be a much much larger change than just changing the ALTREP API (which for
things like this explicitly and intentionally matches how the C api behaves
for non-ALTREP SEXPs for compatability).

Likewise the reason that get_element is going to return a CHARSXP, is
because that is what STRING_ELT(x, i) returns (equivalent to (SEXP)
DATAPTR(x)[i] ), so I don't think that can be changed either.

One other thing to note, though, is that if your'e asking for the dataptr
(and it isn't read only) then you're basically stepping out of ALTREP space
anyway, so it makes sense that a normally laid-out STRSXP (with it's
CHARSXP payload).

Best,
~G

On Thu, May 9, 2019 at 8:09 AM 介非王 <szwjf08 using gmail.com> wrote:

> Hello from Bioconductor,
>
> I'm developing a package to share R objects across clusters using boost
> library. The concept is similar to mmap package:
> https://cran.r-project.org/web/packages/mmap/index.html . However, I have
> a
> problem when I was trying to write Dataptr_method for the alternative
> string.
>
> Based on my understanding, the return value of the Dataptr_method function
> should be a vector of CHARSXP pointers. This design might be problematic in
> two ways:
>
> 1. The behavior of Dataptr_method function is inconsistent for string and
> the other ALTREP types. For the other types we return a vector of pure data
> in memory allocated outside of R, but for the string, we return a vector of
> R objects allocated by R.
>
> 2. It causes an unnecessary duplication of the data. In order to return
> CHARSXPs to R, It forces me to allocate CHARSXPs and copy the entire data
> to the R process. By contrast, for the other ALTREP types, say altreal, I
> can just return the pointer to R if the data is in the memory.
>
> The same problem occurs for Elt_method as well but is less serious since
> only one CHARSXPs is allocated. Because my package is designed for sharing
> a large R object. An allocation of the memory is undesired especially when
> the data is read-only in the code(eg. print function). I'm not sure if
> there are any solutions existed in the current R version, but I can imagine
> three workarounds:
>
> 1. Change the behavior of the R functions and use get_element function
> instead of Dataptr function. This would make the problem more
> memory-friendly but still cause the allocation.
>
> 2. Return a vector of const char* in Dataptr method. It would be very
> efficient and consistent with the return values of the other ALTREP types.
>
> 3. Provide an alternative CHARSXP. This might be the best solution since
> STRSXP behaves more like a list instead of a string, so an alternative
> CHARSXP fits the concept of ALTREP better.
>
> Since I'm not an expert in R so I might post a solved problem. I would be
> very happy and appreciate any suggestions regarding this problem.
>
> Best,
> Jiefei
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list