[Rd] `dendrapply` Enhancements
Lakshman, Aidan H
AHL27 @end|ng |rom p|tt@edu
Thu Feb 23 22:52:08 CET 2023
Hi everyone,
My apologies if this isn�t the right place to submit this�I�m new to the R-devel community and still figuring out what is where.
If people want to skip my writeup and just look at the code, I�ve made a repository for it here: https://github.com/ahl27/new_dendrapply/tree/master. I�m not quite sure how to integrate it into a fork of R-devel; the package structure is different from what I�m used to.
I had written a slightly improved version of dendrapply for one of my research projects, and my advisor encouraged me to submit it to the R project. It took me longer than I expected, but I�ve finally gotten my implementation to be a drop-in replacement for `stats::dendrapply`. The man page for `stats::dendrapply` says �The implementation is somewhat experimental and suggestions for enhancements (or nice examples of usage) are very welcome,� so I figured this had the potential to be a worthwhile contribution. I wanted to send it out to R-devel to see if this was something worth pursuing as an enhancement to R.
The implementation I have is based in C, which I understand implies an increased burden of maintenance over pure R code. However, it does come with the following benefits:
- Completely eliminates recursion, so no memory overhead from function calls or possibility of stack overflows (this was a major issue reported on some of the functions in one of our Bioconductor packages that previously used `dendrapply`).
- Modest runtime improvement, around 2x on my computer (2021 MBP, 32GB RAM). I�m relatively confident this could be optimized more.
- Seemingly significant reduction in memory reduction, still working on a robust benchmark. Suggestions for the best way to do that are welcome.
- Support for applying functions with an inorder traversal (as in `stats::dendrapply`) as well as using a postorder traversal.
This implementation was tested manually as well as running all the unit tests in `dendextend`, which comprises a lot of applications of `dendrapply`.
The postorder traversal would be a significant new functionality to dendrapply, as it would allow for functions that use the child nodes to correctly execute. A toy example of this is something like:
```
exFunc <- function(x){
attr(x, 'newA') <- 'a'
if(is.null(attr(x, 'leaf'))){
cat(attr(x[[1]], 'newA'), attr(x[[2]], 'newA'))
cat('\n')
}
x
})
dendrapply(dend, exFunc)
```
With the current version of dendrapply, this prints nothing, but the postorder traversal version will print �a� twice for each internal branch. If this would be a worthwhile addition, I can refactor the code for brevity and add a `how=c("in.order", "post.order")`, with the default value �in.order� to maintain backwards compatibility. A preorder traversal version should also be possible, I just haven�t gotten to it yet.
I think the runtime could be optimized more as well.
Thank you in advance for looking at my code and offering feedback; I�m excited at the possibility of helping contribute to the R project! I�m happy to discuss more either here, on GitHub, or on the R Contributors Slack.
Sincerely,
Aidan Lakshman
-----------------------
Aidan Lakshman (he/him)<https://www.ahl27.com/>
Doctoral Candidate, Wright Lab<https://www.wrightlabscience.com/>
University of Pittsburgh School of Medicine
Department of Biomedical Informatics
ahl27 using pitt.edu
(724) 612-9940
[[alternative HTML version deleted]]
More information about the R-devel
mailing list