[Rd] Improved Data Aggregation and Summary Statistics in R

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Wed Feb 27 12:48:53 CET 2019


On 26/02/2019 8:25 a.m., Sebastian Martin Krantz wrote:
> Dear Developers,
> 
> Having spent time developing and thinking about how data aggregation and
> summary statistics can be enhanced in R, I would like to present my
> ideas/efforts in the form of two commands:
> 
> The first, which for now I called 'collap', is an upgrade of aggregate that
> accommodates and extends the functionality of aggregate in various
> respects, most importantly to work with multilevel and multi-type data,
> multiple function calls, highly customized aggregation tasks, a much
> greater flexibility in the passing of inputs and tidy output.
> 
> The second function, 'qsu', is an advanced and flexible summary command for
> cross-sectional and multilevel (panel) data (i.e. it can provide overall,
> between and within entities statistics, and allows for grouping, custom
> functions and transformations). It also provides a quick method to compute
> and output within-transformed data.
> 
> Both commands are efficiently built from core R, but provide for optional
> integration with data.table, which renders them extremely fast on large
> datasets. An explanation of the syntax, a demonstration and benchmark
> results are provided in the attached vignette.
> 
> Since both commands accommodate existing functionality while adding
> significant basic functionality, I though that their addition to the stats
> package would be a worthwhile consideration. I am happy for your feedback.

Generally the R Core group is reluctant to incorporate new functions 
into the base packages.  Each function that is added adds to their work, 
and they already have too much to do.  (I am no longer a member of R 
Core, but I don't think things have changed since I retired.)

It is much easier for them if volunteers publish functions themselves, 
via contributed packages.

Nowadays Github provides a very convenient platform on which you can 
develop a package containing your functions.  If other users find bugs 
or have suggested improvements, it's very easy for them to send those to 
you, and you can make the fixes available immediately.  Once you are 
satisfied that it is stable, you can submit it to CRAN, and anyone using 
R can easily install it.

If you find the prospect of writing a package daunting, you shouldn't. 
It's actually quite easy, especially if you are using RStudio or ESS (or 
some other helpful front-end.)  Hadley Wickham's book
<http://r-pkgs.had.co.nz/> is a pretty accessible description of a 
development strategy.  (It's not the only strategy, but lots of people 
use it.)

Duncan Murdoch



More information about the R-devel mailing list