Getting summary statistics for a quantitative variable is a very common task in data analysis. Unfortunately, **R** makes it surprisingly difficult.

The `qstats`

function is an attempt to rectify the situation by making it simple to get any number of descriptive statistics for a numeric variable and to break these statistics down by the levels of one or more categorical variables (groups).

The general format is

`qstats(data, variable, grouping variables, statistics, other options)`

Note that variable names do not have to be quoted.

By default the *sample size*, *mean*, and *standard deviation* are provided. Let’s take a look at fuel efficiencies for 11,914 automobiles in the `cardata`

data frame.

```
# simple summary statistics
qstats(cardata, highway_mpg)
#> n mean sd
#> 1 11914 26.64 8.86
# summary statistics by vehicle_size
qstats(cardata, highway_mpg, vehicle_size)
#> vehicle_size n mean sd
#> 1 Compact 4764 28.94 9.58
#> 2 Large 2777 22.42 7.37
#> 3 Midsize 4373 26.80 7.91
# summary statistics by vehicle_size and drive type
qstats(cardata, highway_mpg, vehicle_size, driven_wheels)
#> vehicle_size driven_wheels n mean sd
#> 1 Compact all wheel drive 646 26.88 4.77
#> 2 Compact four wheel drive 407 20.79 2.90
#> 3 Compact front wheel drive 2491 33.26 9.89
#> 4 Compact rear wheel drive 1220 23.94 7.50
#> 5 Large all wheel drive 438 26.00 12.84
#> 6 Large four wheel drive 737 19.57 2.66
#> 7 Large front wheel drive 389 25.78 2.46
#> 8 Large rear wheel drive 1213 21.78 6.73
#> 9 Midsize all wheel drive 1269 25.83 4.41
#> 10 Midsize four wheel drive 259 18.85 2.51
#> 11 Midsize front wheel drive 1907 30.24 9.46
#> 12 Midsize rear wheel drive 938 23.32 5.16
```

You can supply a statistics argument with the “stats” parameter. You can pass a single statistic, or multiple statistics as a vector of names.

```
# single statistic
qstats(cardata, highway_mpg, vehicle_size, stats = "median")
#> vehicle_size median
#> 1 Compact 28
#> 2 Large 22
#> 3 Midsize 26
# multiple statistics
qstats(cardata, highway_mpg, vehicle_size,
stats = c("median", "min", "max"))
#> vehicle_size median min max
#> 1 Compact 28 12 111
#> 2 Large 22 13 107
#> 3 Midsize 26 12 354
```

User-defined functions can also be used as a statistics. The only requirement is that the function returns a single number.

```
#custom statistics
<- function(x) quantile(x, probs=.25)
p25 <- function(x) quantile(x, probs=.75)
p75
#calling the built in and custom statistics
qstats(cardata, highway_mpg, vehicle_size,
stats = c("min", "p25", "p75", "max"))
#> vehicle_size min p25 p75 max
#> 1 Compact 12 24 33 111
#> 2 Large 13 19 25 107
#> 3 Midsize 12 23 31 354
```

Other options include

**na.rm**When TRUE, NAs are removed. Default is TRUE.

**digits**The number of decimal points to print. Default = 2.

```
qstats(cardata, highway_mpg, vehicle_size,
stats=c("n", "mean","median","sd"),
na.rm=FALSE, digits=2)
#> vehicle_size n mean median sd
#> 1 Compact 4764 28.94 28 9.58
#> 2 Large 2777 22.42 22 7.37
#> 3 Midsize 4373 26.80 26 7.91
```