[Rd] Improved Data Aggregation and Summary Statistics in R
Sebastian Martin Krantz
@eb@@t|@n@kr@ntz @end|ng |rom gr@du@te|n@t|tute@ch
Tue Feb 26 14:25:10 CET 2019
Dear Developers,
Having spent time developing and thinking about how data aggregation and
summary statistics can be enhanced in R, I would like to present my
ideas/efforts in the form of two commands:
The first, which for now I called 'collap', is an upgrade of aggregate that
accommodates and extends the functionality of aggregate in various
respects, most importantly to work with multilevel and multi-type data,
multiple function calls, highly customized aggregation tasks, a much
greater flexibility in the passing of inputs and tidy output.
The second function, 'qsu', is an advanced and flexible summary command for
cross-sectional and multilevel (panel) data (i.e. it can provide overall,
between and within entities statistics, and allows for grouping, custom
functions and transformations). It also provides a quick method to compute
and output within-transformed data.
Both commands are efficiently built from core R, but provide for optional
integration with data.table, which renders them extremely fast on large
datasets. An explanation of the syntax, a demonstration and benchmark
results are provided in the attached vignette.
Since both commands accommodate existing functionality while adding
significant basic functionality, I though that their addition to the stats
package would be a worthwhile consideration. I am happy for your feedback.
Best regards,
Sebastian Krantz
-------------- next part --------------
A non-text attachment was scrubbed...
Name: collap & qsu vignette.pdf
Type: application/pdf
Size: 569278 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20190226/eb4dd92d/attachment.pdf>
More information about the R-devel
mailing list