[R] Descriptive Statistics: useful hacks
Leonard Mada
|eo@m@d@ @end|ng |rom @yon|c@eu
Sun Oct 3 00:00:15 CEST 2021
Dear R Users,
I have started to compile some useful hacks for the generation of nice
descriptive statistics. I hope that these functions & hacks are useful
to the wider R community. I hope that package developers also get some
inspiration from the code or from these ideas.
I have started to review various packages focused on descriptive
statistics - although I am still at the very beginning.
### Hacks / Code
- split table headers in 2 rows;
- split results over 2 rows: view.gtsummary(...);
- add abbreviations as footnotes: add.abbrev(...);
The results are exported as a web page (using shiny) and can be printed
as a pdf documented. See the following pdf example:
https://github.com/discoleo/R/blob/master/Stat/Tools.DescriptiveStatistics.Example_1.pdf
### Example
# currently focused on package gtsummary
library(gtsummary)
library(xml2)
mtcars %>%
# rename2():
# - see file Tools.Data.R;
# - behaves in most cases the same as dplyr::rename();
rename2("HP" = "hp", "Displ" = disp, "Wt (klb)" = "wt", "Rar" =
drat) %>%
# as.factor.df():
# - see file Tools.Data.R;
# - encode as (ordered) factor;
as.factor.df("cyl", "Cyl ") %>%
# the Descriptive Statistics:
tbl_summary(by = cyl) %>%
modify_header(update = header) %>%
add_p() %>%
add_overall() %>%
modify_header(update = header0) %>%
# Hack: split long statistics !!!
view.gtsummary(view=FALSE, len=8) %>%
add.abbrev(
c("Displ", "HP", "Rar", "Wt (klb)" = "Wt"),
c("Displacement (in^3)", "Gross horsepower", "Rear axle ratio",
"Weight (1000 lbs)"));
The required functions are on Github:
https://github.com/discoleo/R/blob/master/Stat/Tools.DescriptiveStatistics.R
The functions rename2() & as.factor.df() are only data-helpers and can
be found also on Github:
https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R
Note:
1.) The function add.abbrev() operates on the generated html-code:
- the functionality is more generic and could be used easily with other
packages that export web pages as well;
2.) Split statistics: is an ugly hack. I plan to redesign the
functionality using xml-technologies. But I have already too many
side-projects.
3.) as.factor.df(): traditionally, one would create derived data-sets or
add a new column with the variable as factor (as the user may need the
numeric values for further analysis). But it looked nicer as a single
block of code.
Sincerely,
Leonard
More information about the R-help
mailing list