[R] summary() and the mode
Marc Schwartz
marc_schwartz at me.com
Thu Jan 23 22:33:37 CET 2014
On Jan 23, 2014, at 2:27 PM, Ruhil, Anirudh <ruhil at ohio.edu> wrote:
> A student asked: Why does R's summary() command yield the Mean and the Median, quartiles, min, and max but was written to exclude the Mode?
>
> I said I had no clue, googled the question without much luck, and am now posting it to see if anybody knows why.
>
> Ani
It has been discussed various times over the years. Presuming that there is interest in knowing it, the problem is how to estimate the mode, depending upon the nature of the data.
That is, if the data are discrete (eg. a factor), a simple tabulation using table() can yield the one or perhaps more than one, most frequently occurring value. In this case:
set.seed(1)
x <- sample(letters, 500, replace = TRUE)
tab <- table(x)
# Get the first maximum value
tab[which.max(tab)]
If the data are continuous, then strictly speaking the mode is not well defined and you need to utilize something along the lines of a density estimation. In that case:
set.seed(1)
x <- rnorm(500)
# Get the density estimates
dx <- density(x)
# Which value is at the peak
dx$x[which.max(dx$y)]
Visual inspection is also helpful in this case:
plot(dx)
abline(v = dx$x[which.max(dx$y)])
See ?table, ?density and ?which.max
Regards,
Marc Schwartz
More information about the R-help
mailing list