--- title: "Model estimation" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Model estimation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: ref.bib link-citations: true --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.dim = c(10,6), out.width = "80%", fig.align = 'center', fig.path = "fHMM-" ) library("fHMM") ``` The `{fHMM}` package fits a hidden Markov model via the maximum-likelihood method, i.e. by numerically maximizing the likelihood function. This vignette^[This vignette was build using R `r paste(R.Version()[c("major", "minor")], collapse = ".")` with the `{fHMM}` `r utils::packageVersion("fHMM")` package.] derives the likelihood formula, discusses challenges associated with the likelihood maximization and presents the model estimation workflow. ## Likelihood evaluation using the forward algorithm Deriving the likelihood function of an hidden Markov model is part of the hierarchical case, hence the following only discusses the general case. An HHMM can be treated as an HMM with two conditionally independent data streams; the coarse-scale observations on the one hand and the corresponding chunk of fine-scale observations connected to a fine-scale HMM on the other hand. To derive the likelihood of an HHMM, we start by computing the likelihood of each chunk of fine-scale observations being generated by each fine-scale HMM. To fit the $i$-th fine-scale HMM, with model parameters $\theta^{*(i)}=(\delta^{*(i)}, \Gamma^{*(i)},(f^{*(i,k)})_k)$ to the $t$-th chunk of fine-scale observations, which is denoted by $(X_{t,t^*})_{t^*}$, we consider the fine-scale forward probabilities \begin{align*} \alpha^{*(i)}_{k,t^*}=f^{*(i)}(X^*_{t,1},\dots,X^*_{t,t^*}, S^*_{t,t^*}=k), \end{align*} where $t^*=1,\dots,T^*$ and $k=1,\dots,N^*$. Using the fine-scale forward probabilities, the fine-scale likelihoods can be obtained from the law of total probability as \begin{align*} \mathcal{L}^\text{HMM}(\theta^{*(i)}\mid (X^*_{t,t^*})_{t^*})=\sum_{k=1}^{N^*}\alpha^{*(i)}_{k,T^*}. \end{align*} The forward probabilities can be calculated in a recursively as \begin{align*} \alpha^{*(i)}_{k,1} &= \delta^{*(i)}_k f^{*(i,k)}(X^*_{t,1}), \\ \alpha^{*(i)}_{k,t^*} &= f^{*(i,k)}(X^*_{t,t^*})\sum_{j=1}^{N^*}\gamma^{*(i)}_{jk}\alpha^{*(i)}_{j,t^*-1}, \quad t^*=2,\dots,T^*. \end{align*} The transition from the likelihood function of an HMM to the likelihood function of an HHMM is straightforward: Consider the coarse-scale forward probabilities \begin{align*} \alpha_{i,t}=f(X_1,\dots,X_t,(X^*_{1,t^*})_{t^*},\dots,(X^*_{t,t^*})_{t^*}, S_t=i), \end{align*} where $t=1,\dots,T$ and $i=1,\dots,N$. The likelihood function of the HHMM results as \begin{align*} \mathcal{L}^\text{HHMM}(\theta,(\theta^{*(i)})_i\mid (X_t)_t,((X^*_{t,t^*})_{t^*})_t)=\sum_{i=1}^{N}\alpha_{i,T}. \end{align*} The coarse-scale forward probabilities can be calculated similarly by applying the recursive scheme \begin{align*} \alpha_{i,1} &= \delta_i \mathcal{L}^\text{HMM}(\theta^{*(i)}\mid (X^*_{1,t^*})_{t^*})f^{(i)}(X_1), \\ \alpha_{i,t} &= \mathcal{L}^\text{HMM}(\theta^{*(i)}\mid (X^*_{t,t^*})_{t^*}) f^{(i)}(X_t)\sum_{j=1}^{N}\gamma_{ji}\alpha_{j,t-1}, \quad t=2,\dots,T. \end{align*} ## Challenges associated with the likelihood maximization To account for parameter constraints associated with the transition probabilities (and potentially the parameters of the state-dependent distributions), we use parameter transformations. To ensure that the entries of the t.p.m.s fulfill non-negativity and the unity condition, we estimate unconstrained values $(\eta_{ij})_{i\neq j}$ for the non-diagonal entries of $\Gamma$ and derive the probabilities using the multinomial logit link \begin{align*} \gamma_{ij}=\frac{\exp[\eta_{ij}]}{1+\sum_{k\neq i}\exp[\eta_{ik}]},~i\neq j \end{align*} rather than estimating the probabilities $(\gamma_{ij})_{i,j}$ directly. The diagonal entries result from the unity condition as \begin{align*} \gamma_{ii}=1-\sum_{j\neq i}\gamma_{ij}. \end{align*} Furthermore, variances are strictly positive, which can be achieved by applying an exponential transformation to the unconstrained estimator. When numerically maximizing the likelihood using some Newton-Raphson-type method, we often face numerical under- or overflow, which can be addressed by maximizing the logarithm of the likelihood and incorporating constants in a conducive way, see @zuc16 and @oel21 for details. As the likelihood is maximized with respect to a relatively large number of parameters, the obtained maximum can be a local rather than the global one. To avoid this problem, it is recommended to run the maximization multiple times from different, possibly randomly selected starting points, and to choose the model that corresponds to the highest likelihood, again see @zuc16 and @oel21 for details. ## Application For illustration, we fit a 3-state HMM with state-dependent t-distributions to the DAX data that we prepared in the previous vignette on data management: ```{r data preparation} controls <- list( states = 3, sdds = "t", data = list(file = system.file("extdata", "dax.csv", package = "fHMM"), date_column = "Date", data_column = "Close", from = "2000-01-01", to = "2021-12-31", logreturns = TRUE), fit = list("runs" = 100) ) controls <- set_controls(controls) data <- prepare_data(controls) ``` The `data` object can be directly passed to the `fit_model()` function that numerically maximizes the model's (log-) likelihood function `runs = 100` times.^[For convenience, the `controls` object is saved in `data`.] This task can be parallelized by setting the `ncluster` argument.^[If you avoid any problems with clustering on your OS, use `ncluster = 1` which avoids any dependency on packages that provide clustering.] ```{r fit, eval=FALSE} dax_model_3t <- fit_model(data, seed = 1, verbose = FALSE) ``` The estimated model is saved in the `{fHMM}` package and can be accessed as follows: ```{r access model} data(dax_model_3t) ``` Calling the `summary()` method provides an overview of the model fit:^[Alternatively, `coef()` only returns the estimated model coefficients] ```{r summarize model} summary(dax_model_3t) ``` The estimated state-dependent distributions can be plotted as follows: ```{r plot-sdds} plot(dax_model_3t, plot_type = "sdds") ``` As mentioned above, the HMM likelihood function is prone to local optima. This effect can be visualized by plotting the log-likelihood value in the different optimization runs, where the best run is marked in red: ```{r plot-lls} plot(dax_model_3t, plot_type = "ll") ``` ## References