[Rd] R strings, null-terminated or size delimited?
Guillaume Yziquel
guillaume.yziquel at citycable.ch
Sun Nov 22 00:31:24 CET 2009
Simon Urbanek a écrit :
>
> On Nov 21, 2009, at 4:12 PM, Guillaume Yziquel wrote:
>
>> Hello.
>>
>> I've been looking at vecsexps for my binding.
>>
>> Concerning strings, I'm wondering: are they supposed to be
>> null-delimited?
>
> Yes, they are null-delimited when you create/access them.
OK. Fair enough. But is guaranteed that null-delimitation ends where the
vecsxp field of the * VECSEXP tells where the R vector should end? Let
me rephrase that:
-1- Should I consider it a bug if the two informations differ?
-2- What's the "safest" way out of the two?
>> Are they delimited by the info in the SEXPHEADER macro in Rinternals.h?
>
>
> You should not be touching or reading that.
I believe I should. I'd like the OCaml / R binding to be closely knit to
R internals. One reason would be for speed, the other being that I'd
like to make use of camlp4 to write syntax extensions to mix OCaml and R
syntax. It's therefore important for me not to rely on the R interpreter
to be active when building R values. Or when marshaling R values via
OCaml. There are numerous other issues aside this one.
I'm already using #define USE_RINTERNALS in my .c file to inspect R values.
>> Basically, what are the macros or functions to access the values of
>> the vecsexps?
>
> VECTOR_ELT and SET_VECTOR_ELT (assuming that you're referring to VECSXP
> which is are generic vectors).
No. I'm refering to INTSXP for now. But I see what you mean:
> #define INTEGER(x) ((int *) DATAPTR(x))
> #define VECTOR_ELT(x,i) ((SEXP *) DATAPTR(x))[i]
VECTOR_ELT is not suitable for INTSXP arrays. I need to convert to
INTSXP array to an OCaml list / array.
>> I'm thinking of CHARSXPs and INTSXPs for the moment...
>
> Those are entirely different - CHARSXP are not vectors but strings (see
> mkChar et al., CHAR, ...) and INTSXP are integer arrays (in C speak)
> accessed using INTEGER.
OK. They're not vectors. They're VECTOR_SEXPRECs.
> Please read R-exts - it's better than guessing.
Funny, I have R-exts.pdf and R-ints.pdf opened. They're fine when it
comes to writing R extensions. Not when writing bindings embedding R
into OCaml so that you can beta-reduce isomorphically in R and OCaml.
> Cheers,
> Simon
I'm already using heretic features in OCaml (namely Obj.magic) in order
to do this binding. I do not mind using heretic features of the R API.
I do not mean to be a pain, but I have to do what needs to be done. If I
find on my way that #define USE_RINTERNALS is overkill, I'll gladly drop it.
For instance, here's one of my issues: I've extracted the R SEXP for the
"str" symbol. It's a promise. Now, how do I map such a SEXP to an OCaml
function? Haven't found that in R-ints.pdf or R-exts.pdf. There's talk
about functions, but promises are somewhat overlooked. However, such a
mapping is crucial to me.
I was not guessing when I was trying to look at the internal structure
of R data. Simply trying to get a grip on how to execute promises, and
therefore examining such a promise:
> # R.Internal.Pretty.t_of_sexp (R.Raw.sexp_of_t (R.symbol "str"));;
> - : R.Internal.Pretty.t =
> PROMISE
> {value = SYMBOL None;
> expr =
> CALL (SYMBOL (Some ("lazyLoadDBfetch", BUILTIN)),
> [INT [105; 153119]; Unknown; Unknown; Unknown]);
> env = Unknown}
Or, following structures in Rinternals.h:
> # R.Internal.C.t_of_sexp (R.Raw.sexp_of_t (R.symbol "str"));;
> - : R.Internal.C.t =
> Val
> {content =
> PROMSXP
> {prom_value =
> Val
> {content =
> SYMSXP
> {pname = Val {content = NILSXP};
> sym_value = R.Internal.C.Recursive <lazy>;
> internal = Val {content = NILSXP}}};
> R.Internal.C.expr =
> Val
> {content =
> LANGSXP
> {carval =
> Val
> {content =
> SYMSXP
> {pname = Val {content = CHARSXP "lazyLoadDBfetch"};
> sym_value = Val {content = BUILTINSXP 687};
> internal = Val {content = NILSXP}}};
> cdrval =
> Val
> {content =
> LISTSXP
> {carval = Val {content = INTSXP [105; 153119]};
> cdrval =
> Val
> {content =
> LISTSXP
> {carval =
> Val
> {content =
> SYMSXP
> {pname = Val {content = CHARSXP "datafile"};
> sym_value =
> Val
> {content =
> SYMSXP
> {pname = Val {content = NILSXP};
> sym_value = R.Internal.C.Recursive <lazy>;
> internal = Val {content = NILSXP}}};
> internal = Val {content = NILSXP}}};
> cdrval =
> Val
> {content =
> LISTSXP
> {carval =
> Val
> {content =
> SYMSXP
> {pname =
> Val {content = CHARSXP "compressed"};
> sym_value =
> Val
> {content =
> SYMSXP
> {pname = Val {content = NILSXP};
> sym_value =
> R.Internal.C.Recursive <lazy>;
> internal = Val {content = NILSXP}}};
> internal = Val {content = NILSXP}}};
> cdrval =
> Val
> {content =
> LISTSXP
> {carval =
> Val
> {content =
> SYMSXP
> {pname =
> Val {content = CHARSXP "envhook"};
> sym_value =
> Val
> {content =
> SYMSXP
> {pname = Val {content = NILSXP};
> sym_value =
> R.Internal.C.Recursive <lazy>;
> internal =
> Val {content = NILSXP}}};
> internal = Val {content = NILSXP}}};
> cdrval = Val {content = NILSXP};
> tagval = Val {content = NILSXP}}};
> tagval = Val {content = NILSXP}}};
> tagval = Val {content = NILSXP}}};
> tagval = Val {content = NILSXP}}};
> tagval = Val {content = NILSXP}}};
> R.Internal.C.env = Val {content = ENVSXP}}}
> #
For instance, an issue I'd like advice on is: what does such a symbol mean?
> SYMSXP
> {pname = Val {content = CHARSXP "datafile"};
> sym_value =
> Val
> {content =
> SYMSXP
> {pname = Val {content = NILSXP};
> sym_value = R.Internal.C.Recursive <lazy>;
> internal = Val {content = NILSXP}}};
> internal = Val {content = NILSXP}}};
And how is it treated when "str" is executed?
All the best.
--
Guillaume Yziquel
http://yziquel.homelinux.org/
More information about the R-devel
mailing list