[Rd] `dendrapply` Enhancements

Lakshman, Aidan H AHL27 @end|ng |rom p|tt@edu
Thu Mar 23 12:27:42 CET 2023


No problem! I know it�s a lot of code and also not super high priority haha; I appreciate the feedback and comments.

I had looked at precious multi sets, but I wasn�t quite sure how to implement them�the docs mentioned they were mainly intended for the interpreter, and I couldn�t find a ton of code examples to work off of.

Another idea I had was to make use of R-level dendrogram indexing. I recently learned dendrograms support vector indexing, so like dend[[c(1,1)]] == dend[[1]][[1]]. It could theoretically be possible to make the linked list just store the access vectors, and then have the R function execute by taking the root and the vector as input. Theoretically would solve a lot of these issues, but there�s a lot of quirks I can see causing issues.

> That's a clever solution! Can you profile the code to see if there are
> visible sources of slowdown? Maybe this can be salvaged.

Yep, I'll add that to my list. Salvaging the existing solution would definitely be a lot simpler for me than the other options mentioned here :D

When I get back to dendrapply (hopefully later today) I'll look at precious MSets, seeing if a simplification with vector indexing is possible, and profiling the existing code.

Thanks again for taking a look! I really appreciate it.

-Aidan

-----------------------

Aidan Lakshman (he/him)<https://www.ahl27.com/>

Doctoral Candidate, Wright Lab<https://www.wrightlabscience.com/>

University of Pittsburgh School of Medicine

Department of Biomedical Informatics

ahl27 using pitt.edu

(724) 612-9940



________________________________
From: Ivan Krylov <krylov.r00t using gmail.com>
Sent: Thursday, March 23, 2023 6:05:37 AM
To: Lakshman, Aidan H <AHL27 using pitt.edu>
Cc: R-devel using r-project.org <R-devel using r-project.org>
Subject: Re: [Rd] `dendrapply` Enhancements

Hello Aidan,

Sorry for dropping this for a while.

� Thu, 2 Mar 2023 21:03:59 +0000
"Lakshman, Aidan H" <AHL27 using pitt.edu> �����:

> //after
> curnode = eval(lang3(R_Bracket2Symbol, parent->node, DEND_IND), env);

lang3() always constructs a new language object. If you do end up using
eval(), it may make sense to move lang3() out of the loop and reuse the
existing object by referring to the DEND_IND variable using its symbol,
like it's done in the lapply() implementation.

> The problem is, it seems like the returned value from `eval` is not
> protected, whereas the value from VECTOR_ELT is (if the source list
> is protected). My understanding is that this happens because
> VECTOR_ELT just copies the pointer, whereas eval(�) calls R code,
> which returns a copy of the object.

That's right, `[[.dendrogram` returns a new object which is not
protected, unlike the raw elements of node that you're keeping a
protected pointer to.

> This ends up being problematic, since it isn�t feasible to protect
> all the nodes until we�re done with them.

I see. It's not easy in R to unprotect arbitrary previously-allocated
objects in a safe way. Have you considered "precious multi-sets"
<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fblog.r-project.org%2F2018%2F12%2F10%2Funprotecting-by-value%2Findex.html&data=05%7C01%7CAHL27%40pitt.edu%7Ce05bb9d748bf45bd4d0308db2b862883%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C638151627427438911%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3A447zpoQaI1%2BOptiBE8Ix4gXr72VrOdaByvUrHRBbA%3D&reserved=0>?
They have R_ReleaseFromMSet(), making it possible to free arbitrary
objects from the set, and they automatically unprotect everything they
contain on destruction.

> It�s also possible to use eval(�) to get the node, apply the function
> to it, save its class in the linked list, and then save the object
> using VECTOR_ELT. This way we get the benefits of `[[` dispatch,
> class preservation, and constant stack space. However, this ends up
> hurting performance significantly (about 4x slower than the current
> new version, making it around half the speed of the version in stats).

That's a clever solution! Can you profile the code to see if there are
visible sources of slowdown? Maybe this can be salvaged.

--
Best regards,
Ivan

	[[alternative HTML version deleted]]



More information about the R-devel mailing list