--- title: "mut_info()" output: rmarkdown::html_vignette: fig_width: 5 fig_height: 4 self_contained: false vignette: | %\VignetteIndexEntry{mut_info()} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r echo = FALSE, warning=FALSE} library(YEAB) ``` ## Introduction Behavioral analysis is a field of study in which the necessity often arise to understand the way in which different variables relate to each other and the meaning behind such relationships. One useful and powerful tool used to address this questions is the concept of mutual information, rooted in information theory. Information theory provides an ideal framework for quantifying the amount of information related between variables. In this scenario, mutual information allows to measure the degree of dependency between two variables by assessing how much knowing one variable reduces uncertainty about the other. This metric is particularly valuable in the realm of behavioral analysis as it allows researchers to discern connections and dependencies between experimental variables and the behaviors observed. Here we introduce two functions to calculate Mutual Information using a wide variety of methods. The general definition for calculating Mutual Information is as follows: $$I(X;Y) = \sum_{i=1}^{n} \sum_{j=1}^{m}P(x_i,y_j)log \frac{P(x_i,y_j)}{P(x_i)P(y_j)} $$ Where $X$ and $Y$ are discrete random variables and $P(x_i)$ and $P(y_j)$ are the probabilities of every possible state of $X$ and $Y$ respectively. The `mut_info_discrete()` function calculates Mutual Information of continuous variables using discretization through the `discretize()` function from the `infotheo` package. The function takes the following parameters: - `x` a numeric vector representing random variable $X$. - `y` a numeric vector of equal or different size as `x` representing random variable $Y$. - `method` the method to calculate Mutual Information. The default is `emp` for empirical estimation. Other options are`shrink` for the shrinkage estimator, `mm` for the Miller-Madow estimator, and `sg` for the Schurmann-Grassberger estimator. With `mut_info_knn()` we can calculate Mutual Information of continuous variables using the K-Nearest Neighbors method. The function takes the following parameters: - `x` a numeric vector representing random variable $X$. - `y` a numeric vector of equal or different size as `x` representing random variable $Y`. - `k` the number of nearest neighbors to consider. The default is 5. - `direct` Logical; if `TRUE`, mutual information is calculated directly using the method proposed by Kraskov et al. (2004); if `FALSE`, it is computed indirectly via entropy estimates as $(I(X;Y) = H(X) + H(Y) - H(X,Y))$. Default is `TRUE`. ## Example ```{r mutual-information-example} set.seed(123) x <- rnorm(1000) y <- rnorm(1000) plot(x, y, main = "Independent Variables") # close to 0 if they are independent mi_discrete_independent <- mut_info_discrete(x, y) mi_knn_independent <- mut_info_knn(x, y, k = 2, direct = TRUE) cat("Mutual Information (Discrete) for independent variables:", mi_discrete_independent, "\n") cat("Mutual Information (KNN) for independent variables:", mi_knn_independent, "\n") y <- 100 * x + rnorm(length(x), 0, 12) plot(x, y, main = "Dependent Variables") # far from 0 if they are not independent mi_discrete_dependent <- mut_info_discrete(x, y) mi_knn_dependent <- mut_info_knn(x, y, k = 2, direct = TRUE) cat("Mutual Information (Discrete) for dependent variables:", mi_discrete_dependent, "\n") cat("Mutual Information (KNN) for dependent variables:", mi_knn_dependent, "\n") # simulate a sine function with noise set.seed(123) x <- seq(0, 5, 0.1) y <- 5 * sin(x * pi) y_with_noise <- y + rnorm(length(x), 0, 0.5) plot(x, y_with_noise, main = "Sine Function with Noise") lines(x, y, col = 2) # add a regression line abline(lm(y ~ x)) # compute correlation coefficient; for nonlinear functions is close to 0 correlation <- cor(x, y_with_noise) cat("Correlation coefficient for sine function with noise:", correlation, "\n") # mutual information can detect nonlinear dependencies mi_discrete_sine <- mut_info_discrete(x, y_with_noise) mi_knn_sine <- mut_info_knn(x, y_with_noise, k = 2, direct = TRUE) cat("Mutual Information (Discrete) for sine function with noise:", mi_discrete_sine, "\n") cat("Mutual Information (KNN) for sine function with noise:", mi_knn_sine, "\n") ```