[Rd] `dendrapply` Enhancements

Toby Hocking tdhock5 @end|ng |rom gm@||@com
Fri Feb 24 12:57:00 CET 2023


Hi Aidan, I think you are on the right email list.
I'm not R-core, but this looks like an interesting/meaningful/significant
contribution to base R.
I'm not sure what the original dendrapply looks like in terms of code style
(variable names/white space formatting/etc) but in my experience it is
important that your code contribution makes minimal changes in that area.
Did you hear about the R project sprint 2023?
https://contributor.r-project.org/r-project-sprint-2023/ Your work falls
into the "new developments" category so I think you could apply for that
funding to participate.
Toby

On Fri, Feb 24, 2023 at 3:47 AM Lakshman, Aidan H <AHL27 using pitt.edu> wrote:

> Hi everyone,
>
> My apologies if this isn’t the right place to submit this—I’m new to the
> R-devel community and still figuring out what is where.
>
> If people want to skip my writeup and just look at the code, I’ve made a
> repository for it here:
> https://github.com/ahl27/new_dendrapply/tree/master. I’m not quite sure
> how to integrate it into a fork of R-devel; the package structure is
> different from what I’m used to.
>
> I had written a slightly improved version of dendrapply for one of my
> research projects, and my advisor encouraged me to submit it to the R
> project. It took me longer than I expected, but I’ve finally gotten my
> implementation to be a drop-in replacement for `stats::dendrapply`. The man
> page for `stats::dendrapply` says “The implementation is somewhat
> experimental and suggestions for enhancements (or nice examples of usage)
> are very welcome,” so I figured this had the potential to be a worthwhile
> contribution. I wanted to send it out to R-devel to see if this was
> something worth pursuing as an enhancement to R.
>
> The implementation I have is based in C, which I understand implies an
> increased burden of maintenance over pure R code. However, it does come
> with the following benefits:
>
> - Completely eliminates recursion, so no memory overhead from function
> calls or possibility of stack overflows (this was a major issue reported on
> some of the functions in one of our Bioconductor packages that previously
> used `dendrapply`).
> - Modest runtime improvement, around 2x on my computer (2021 MBP, 32GB
> RAM). I’m relatively confident this could be optimized more.
> - Seemingly significant reduction in memory reduction, still working on a
> robust benchmark. Suggestions for the best way to do that are welcome.
> - Support for applying functions with an inorder traversal (as in
> `stats::dendrapply`) as well as using a postorder traversal.
>
> This implementation was tested manually as well as running all the unit
> tests in `dendextend`, which comprises a lot of applications of
> `dendrapply`.
>
> The postorder traversal would be a significant new functionality to
> dendrapply, as it would allow for functions that use the child nodes to
> correctly execute. A toy example of this is something like:
> ```
> exFunc <- function(x){
>   attr(x, 'newA') <- 'a'
>   if(is.null(attr(x, 'leaf'))){
>     cat(attr(x[[1]], 'newA'), attr(x[[2]], 'newA'))
>     cat('\n')
>   }
>   x
> })
>
> dendrapply(dend, exFunc)
> ```
>
> With the current version of dendrapply, this prints nothing, but the
> postorder traversal version will print ‘a’ twice for each internal branch.
> If this would be a worthwhile addition, I can refactor the code for brevity
> and add a `how=c("in.order", "post.order")`, with the default value
> “in.order” to maintain backwards compatibility. A preorder traversal
> version should also be possible, I just haven’t gotten to it yet.
>
> I think the runtime could be optimized more as well.
>
> Thank you in advance for looking at my code and offering feedback; I’m
> excited at the possibility of helping contribute to the R project! I’m
> happy to discuss more either here, on GitHub, or on the R Contributors
> Slack.
>
> Sincerely,
> Aidan Lakshman
>
> -----------------------
> Aidan Lakshman (he/him)<https://www.ahl27.com/>
> Doctoral Candidate, Wright Lab<https://www.wrightlabscience.com/>
> University of Pittsburgh School of Medicine
> Department of Biomedical Informatics
> ahl27 using pitt.edu
> (724) 612-9940
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list