[R] fast mkChar
Vadim Ogranovich
vograno at evafunds.com
Wed Jun 9 20:59:10 CEST 2004
Thank you for the lead, Peter. It may be useful for other packages I
write.
As to the strings, I think I have to take what is already there. I agree
that strings would be better managed in malloc-style fashion (probably
with reference counter) and not by gc(). However I don't want to have a
system with two different string classes, such close relatives seldom
coexist peacefully.
BTW, the slowness of mkChar explains why R is so slow when it needs to
compute names for long vectors.
Thank you for an interesting discussion,
Vadim
> -----Original Message-----
> From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]
> Sent: Tuesday, June 08, 2004 3:35 PM
> To: Vadim Ogranovich
> Cc: R-Help
> Subject: Re: [R] fast mkChar
>
> "Vadim Ogranovich" <vograno at evafunds.com> writes:
>
> > I am no expert in memory management in R so it's hard for
> me to tell
> > what is and what is not doable. From reading the code of
> allocVector()
> > in memory.c I think that the critical part is to vectorize
> > CLASS_GET_FREE_NODE and use the vectorized version along
> the lines of
> > the code fragment below (taken from memory.c).
> >
> > if (node_class < NUM_SMALL_NODE_CLASSES) {
> > CLASS_GET_FREE_NODE(node_class, s);
> >
> > If this is possible than the rest is just a matter of code
> refactoring.
> >
> > By vectorizing I mean writing a macro
> CLASS_GET_FREE_NODE2(node_class,
> > s, n) which in one go allocates n little objects of class
> node_class
> > and "inscribes" them into the elements of vector s, which
> is assumed
> > to be long enough to hold these objects.
> >
> > If this is doable than the only missing piece would be a
> new function
> > setChar(CHARSXP rstr, const char * cstr) which copies
> 'cstr' into 'rstr'
> > and (re)allocates the heap memory if necessary. Here the setChar()
> > macro is safe since s[i]-s are all brand new and thus are
> not shared
> > with any other object.
>
> I had a similar idea initially, but I don't think it can fly:
> First, allocating n objects at once is not likely to be much
> faster than allocating them one-by-one, especially when you
> consider the implications of having to deal with
> near-out-of-memory conditions.
> Second, you have to know the string lengths when allocating,
> since the structure of a vector object (CHARSXP) is a header
> immediately followed by the data.
>
> A more interesting line to pursue is that - depending on what
> it really is that you need - you might be able to create a
> different kind of object that could "walk and quack" like a
> character vector, but is stored differently internally. E.g.
> you could set up a representation that is just a block of
> pointers, pointing to strings that are being maintained in
> malloc-style.
>
> Have a look at External pointers and finalization.
>
>
> --
> O__ ---- Peter Dalgaard Blegdamsvej 3
> c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
> (*) \(*) -- University of Copenhagen Denmark Ph:
> (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX:
> (+45) 35327907
>
>
More information about the R-help
mailing list