`library(moder)`

In addition to finding a vector’s modes, you might be interested in some metadata about them:

A vector’s modal

**count**is the number of its modes.A vector’s modal

**frequency**is the number of times that any single mode appears in the vector.

This vignette lays out all the functions for modal metadata. In the
end, it talks about a special feature of these functions, the
`max_unique`

argument.

`mode_count()`

computes the number of modes:

```
mode_count(c(5, 5, 6))
#> [1] 1
mode_count(c(5, 5, 6, 6, 7))
#> [1] 2
```

Even with missing values, the number of modes is sometimes known. It
can only be 1 here because even if the `NA`

is secretly
`"b"`

, then `"b"`

would appear twice, but
`"a"`

would appear three times:

```
mode_count(c("a", "a", "a", "b", NA))
#> [1] 1
```

All of this only works if the full set of modes can be determined.
Below, `NA`

could secretly be `7`

, `8`

,
or any other value. If it’s `8`

, both numbers are equally
frequent. Otherwise, `7`

is the only mode. Since we lack this
information, the number of modes is unknown.

```
mode_count(c(7, 7, 7, 8, 8, NA))
#> [1] NA
```

Use `mode_count_range()`

in such cases. It will determine
the minimal and maximal number of modes, never returning
`NA`

. For more on `mode_count_range()`

, see below,
section *Maximal number of unique values*.

```
mode_count_range(c(7, 7, 7, 8, 8, NA))
#> [1] 1 2
```

`mode_frequency()`

counts the instances of a vector’s
modes in the vector:

```
mode_frequency(c(4, 4, 5))
#> [1] 2
mode_frequency(c(4, 4, 4, 5))
#> [1] 3
```

Missing values are an issue here, even if the mode is obvious. Each
`NA`

might be another instance of the mode, so the frequency
is unknown:

```
mode_frequency(c(1, 1, 1, 1, 2, NA, NA))
#> [1] NA
```

With `mode_frequency_range()`

, at least the minimal and
maximal frequencies can be determined. It never returns `NA`

.
The minimum frequency supposes that no `NA`

s represent the
mode; the maximum frequency supposes that all of them do. In this way,
there are four instances of `1`

without counting the
`NA`

s, and six with counting them:

```
mode_frequency_range(c(1, 1, 1, 1, 2, NA, NA))
#> [1] 4 6
```

Related to frequencies, `mode_is_trivial()`

flags cases
where the mode is not meaningful. It returns `TRUE`

if all
values are equally frequent. Modality is trivial in this case because it
is a property of all values taken together, not of some values over
others.

```
mode_is_trivial(c("a", "b", "c"))
#> [1] TRUE
mode_is_trivial(c(1, 1, 2, 2, 3, 3))
#> [1] TRUE
mode_is_trivial(c(1, 1, 1, 2, 3))
#> [1] FALSE
```

The mode is clearly not a useful concept in the first two cases (cf. Härdle, Klinke, and Rönz 2015, 40). Some authors say that the mode is not defined if each value appears only once (Manikandan 2011, 214). However, it is certainly possible for the maximal frequency to be 1, so the only way for such distributions not to have any modal values would be a specific exception in the definition of the mode. The same applies to uniformly distributed data in general. No such exception appears in any definition that I am aware of. Even if it were to be suggested, I think the more elegant solution would be to accept all values of uniformly distributed data as trivially modal.

All of moder’s functions for metadata, such as
`mode_is_trivial()`

and `mode_count_range()`

, have
a `max_unique`

argument. It allows you to state how many
unique values your data can have at the maximum. Why is this important?
The two functions care about possible modes beyond the known values. In
other words, their results might depend on whether or not the
`NA`

s can mask modal values that don’t even occur among the
known values! If that is possible, it presents an additional source of
uncertainty.

Conversely, `max_unique`

limits the possible number of
such wildcard modes. Specify it as an integer that is the maximal number
of unique values. If there can be no other values than those already
known, specify `max_unique`

as `"known"`

instead.
Always use `"known"`

if you have factor data or you will get
a warning. (The idea behind factors is that all possible values are
known at the outset.)

Note that this argument does not represent an analytical decision but
simply conveys your knowledge of the data to the computer. There is no
meaningful choice to make: If the maximum number of unique values is
known, you must specify `max_unique`

; if not, you must not do
so. Otherwise, you risk incorrect results if any values are missing. The
default is `NULL`

because the baseline assumption is always
that nothing is known about missing values except for their number.

Below is an example. If two of the `NA`

s represent
`8`

and the other three stand for a third value, all values
appear with the same frequency. In this case, all values would trivially
be modes in the sense of `mode_is_trivial()`

. This scenario
is not certain at all, but it can’t be ruled out either, so the function
returns `NA`

. As `mode_count_range()`

shows, there
could be three modes at most. (The minimum is always one if any values
are missing.)

```
<- c(7, 7, 7, 8, NA, NA, NA, NA, NA)
x1 mode_is_trivial(x1)
#> [1] NA
mode_count_range(x1)
#> [1] 1 3
```

The picture is different if we know that each missing value must
represent a known value, i.e., `7`

or `8`

. Even if
two `NA`

s stand for `8`

, the other three can’t be
evenly distributed across `7`

and `8`

, so one of
these values must be more frequent than the other one. This makes the
mode nontrivial. Also, there can only be one mode, so both the minimal
and maximal mode counts are `1`

.

```
x1#> [1] 7 7 7 8 NA NA NA NA NA
mode_is_trivial(x1, max_unique = "known")
#> [1] FALSE
mode_count_range(x1, max_unique = "known")
#> [1] 1 1
```

Three more functions have a `max_unique`

parameter:
`mode_count()`

, `mode_frequency()`

, and
`mode_frequency_range()`

. However, this only matters for
corner cases. See this Github
issue.

Härdle, Wolfgang Karl, Sigbert Klinke, and Bernd Rönz. 2015.
*Introduction to Statistics: Using Interactive MM*Stat Elements*.
Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-17704-5.

Manikandan, S. 2011. “Measures of Central Tendency: Median and
Mode.” *Journal of Pharmacology and Pharmacotherapeutics*
2 (3): 214–15. https://doi.org/10.4103/0976-500X.83300.