--- title: "dfba_beta_descriptive" output: bookdown::html_document2: base_format: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{dfba_beta_descriptive} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(DFBA) ``` # Overview An important probability model in both theoretical and applied statistics is the beta distribution. It is an especially important distribution in Bayesian models of categorical data, which are associated with a number of the nonparametric procedures in the `DFBA` package. The beta is a univariate continuous probability distribution on the $[0,\,1]$ interval. The probability density is $f(x)$, and it is a function of two non-negative finite shape parameters, which we will denote as $a$ and $b$. These shape parameters can be integers or non-integer real values provided that they are greater than zero and finite. The probability density function for a beta distribution is \begin{equation} f(x) = \begin{cases} \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{a-1}(1-x)^{b-1}, & 0 \le x \le 1, a>0, b>0 \\ 0 & elsewhere \end{cases} (\#eq:betadensity) \end{equation} For a given beta distribution, the $a$ and $b$ parameters are fixed values, so the term $\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}$ is a normalization constant that assures that the cumulative probability (*i.e.*, $F(x)=\int_{0}^{x}f(x)\,dx$) over all values for $x$ is $1$.^[The gamma function $\Gamma(x)$ is the generalization of the factorial to real, nonnegative values. If $x$ is an integer, then $\Gamma(x)=(x-1)!$.] The mean of the beta distribution is equal to $\frac{a}{a+b}$. The mode of the distribution is $\frac{a-1}{a+b-2}$, so long as $a>1$ and $b>1$ (Johnson, Kotz, & Balakrishnan,1995). When either (1) $a = b = 1$, (2) $a < 1$, or (3) $b < 1$, the mode is undefined. The variance of the distribution is $\frac{ab}{(a+b)^2)(a+b+1)}$. The purpose of the `dfba_beta_descriptive()` function is to provide centrality and interval estimates as well as to provide an easy way to see displays of both the probability density function and the cumulative probability function. The function provides information on properties of the beta distribution that are important for doing Bayesian inference, and supplements the `dbeta()`, `pbeta()`, `qbeta()`, and `rbeta()` functions included in the `stats` package. The `dfba_beta_descriptive()` function is also called by several of the other functions in the `DFBA` package. The `dfba_beta_descriptive()` function provides the *mean*, *median*, *mode*, and *variance* estimates for a beta variate in terms of the two shape parameters for the beta distribution. The mean and median of the beta distribution are always provided, but, as noted above, there are conditions under which the mode is not defined. For example when $a=b=1$, the beta distribution is a flat density function on the $[0,~1]$ interval, so there is no mode. Another case when there is not a proper mode is when either $0<a<1$, $0<b<1$ or when both shape parameters are less than $1$, which results in the density function that diverges at end points. The `dfba_beta_descriptive()` function reports the modal value as `NA` whenever the mode is not properly defined. In addition to centrality and variance estimates, the `dfba_beta_descriptive()` function provides two interval estimates for the beta variate. Each of the interval estimates captures a set proportion of the distribution where a given probability lies within the limits. For both estimates, the default value is ($95\%$). One interval estimate has *equal-tail probabilities* (*i.e.*, the probability below the lower limit is equal to the probability above the upper limit). The other interval estimate is the most compact interval that contains the stipulated probability; this interval estimate is called the *highest-density interval*. The `dfba_beta_descriptive()` function has three arguments: * `a` * `b` * `prob_interval` # Examples ## Example 1 The first example employs the default value of $.95$ for the `prob_interval` argument, and it examines the case where the first and second beta shape parameters are, respectively, $17$ and $3$: The code for this example is ```{r} dfba_beta_descriptive(a = 17, b = 3) ``` Note that because $a>b$, the distribution has central point estimates greater than $.5$. Also note that the two $95$-percent interval estimates are different. The highest-density interval is a more compressed interval because it is not constrained to have equal probabilities of $.025$ outside each limit. The `plot()` method generates plots of the probability density function and the cumulative probability function: ```{r fig.retina=12, fig.width = 7} plot(dfba_beta_descriptive(a = 17, b = 3)) ``` The `dfba_beta_descriptive()` object list also contains a dataframe of $x$, $f(x)$, $F(x)$ should the user wish to create alternative displays: ```{r} x<- dfba_beta_descriptive(a = 17, b = 3)$outputdf head(x) ``` ## Example 2 Consider the case of a user who is interested in finding the $90\%$ highest-density interval for a beta distribution where the shape parameters are $31$ and $20$: ```{r} x <- dfba_beta_descriptive(a = 31, b = 20, prob_interval = .90) hdi <- c(x$hdi_lower, x$hdi_upper) hdi ``` # References Johnson, N. L., Kotz S., and Balakrishnan, N. (1995). *Continuous Univariate Distributions, Vol. 1*, New York: Wiley.