[Rd] Statistical mode
Arni Magnusson
arnima at hafro.is
Fri May 27 17:50:18 CEST 2011
Thank you, Kevin, for the feedback.
> 1. The mode is not so interesting for continuous data. I would much
> rather use something like density().
Absolutely. The help page for statmode() says it is for discrete data, and
points to density() for continuous data.
> 2. Both the iris and barley data sets are balanced (each factor level
> appears equally often), and the current output from the statmode
> function is misleading by only showing one level.
Try statmode(iris,TRUE). It points out that petal lengths 1.4 and 1.5 are
equally common in the data. I decided to make all=FALSE the default
behavior, but I'd be equally happy with all=TRUE as the default.
As for the barley data, statmode(barley,TRUE) is just the honest answer.
The yield is continuous, so the discrete mode is not of interest, and the
factors levels are all equally common as you point out.
> 3. I think the describe() function in the Hmisc package is much more
> useful and informative, even for introductory stat classes. I always
> use describe() after importing data into R.
The describe() function is a verbose summary, usually of a data frame. The
statmode() function is the discrete mode, usually of a vector.
Importantly, describe(faithful$waiting) points out the mean, median and
range, but not the mode.
---
Allow me to include two more valid comments, from Sarah Goslee and David
Winsemius, respectively:
> 4. The 'modeest' package does this and more, see for example mfv().
I think core R should come with a basic function to get the mode of a
discrete vector. One option would be to lift mfv() into the 'stats'
package, but something like statmode() could also cover factors and
strings. Might as well provide all=TRUE/FALSE functionality, too, and
retain integers as integers.
It's common to find rudimentary basic functionality in the 'stats'
package, and dedicated packages for more details; time series models and
robust statistics come to mind. The 'modeest' package is impressive
indeed.
> 5. Isn't this just table(Vec)[which.max(table(Vec))]?
Yes it is, only less cumbersome. Much like sd(Vec) is less cumbersome than
sqrt(var(Vec)). Moreover, I find it confusing to see the count as well,
table(volcano)[which.max(table(volcano))]
# 110
# 177
although this can be debated. Finally, I think the examples
statmode(mtcars)
statmode(mtcars, TRUE)
demonstrate practical functionality beyond
table(Vec)[which.max(table(Vec))].
The mean, median, and mode are often mentioned together as fundamental
descriptive statistics, and I just find it odd that statmode() is not
already in core R. Sure, we could get by without the sd() function in core
R, but why should we?
All the best,
Arni
More information about the R-devel
mailing list