[R] fast mkChar

Vadim Ogranovich vograno at evafunds.com
Wed Jun 9 20:59:10 CEST 2004


Thank you for the lead, Peter. It may be useful for other packages I
write.

As to the strings, I think I have to take what is already there. I agree
that strings would be better managed in malloc-style fashion (probably
with reference counter) and not by gc(). However I don't want to have a
system with two different string classes, such close relatives seldom
coexist peacefully.

BTW, the slowness of mkChar explains why R is so slow when it needs to
compute names for long vectors.

Thank you for an interesting discussion,
Vadim 

> -----Original Message-----
> From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk] 
> Sent: Tuesday, June 08, 2004 3:35 PM
> To: Vadim Ogranovich
> Cc: R-Help
> Subject: Re: [R] fast mkChar
> 
> "Vadim Ogranovich" <vograno at evafunds.com> writes:
> 
> > I am no expert in memory management in R so it's hard for 
> me to tell 
> > what is and what is not doable. From reading the code of 
> allocVector() 
> > in memory.c I think that the critical part is to vectorize 
> > CLASS_GET_FREE_NODE and use the vectorized version along 
> the lines of 
> > the code fragment below (taken from memory.c).
> > 
> > 	if (node_class < NUM_SMALL_NODE_CLASSES) {
> > 	    CLASS_GET_FREE_NODE(node_class, s);
> > 
> > If this is possible than the rest is just a matter of code 
> refactoring.
> > 
> > By vectorizing I mean writing a macro 
> CLASS_GET_FREE_NODE2(node_class, 
> > s, n) which in one go allocates n little objects of class 
> node_class 
> > and "inscribes" them into the elements of vector s, which 
> is assumed 
> > to be long enough to hold these objects.
> > 
> > If this is doable than the only missing piece would be a 
> new function 
> > setChar(CHARSXP rstr, const char * cstr) which copies 
> 'cstr' into 'rstr'
> > and (re)allocates the heap memory if necessary. Here the setChar() 
> > macro is safe since s[i]-s are all brand new and thus are 
> not shared 
> > with any other object.
> 
> I had a similar idea initially, but I don't think it can fly: 
> First, allocating n objects at once is not likely to be much 
> faster than allocating them one-by-one, especially when you 
> consider the implications of having to deal with 
> near-out-of-memory conditions.
> Second, you have to know the string lengths when allocating, 
> since the structure of a vector object (CHARSXP) is a header 
> immediately followed by the data.
> 
> A more interesting line to pursue is that - depending on what 
> it really is that you need - you might be able to create a 
> different kind of object that could "walk and quack" like a 
> character vector, but is stored differently internally. E.g. 
> you could set up a representation that is just a block of 
> pointers, pointing to strings that are being maintained in 
> malloc-style.
> 
> Have a look at External pointers and finalization.
> 
> 
> -- 
>    O__  ---- Peter Dalgaard             Blegdamsvej 3  
>   c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
>  (*) \(*) -- University of Copenhagen   Denmark      Ph: 
> (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: 
> (+45) 35327907
> 
>




More information about the R-help mailing list