--- title: "dlmwwbe: Dynamic Linear Model for Wastewater-based epidemiology with missing values" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{my-vignette} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( comment = "#>", fig.width = 7.2, fig.height = 4.8, fig.align = "center" ) ``` This package **dlmwwbe** (Dynamic Linear Model for Wastewater-based Epidemilogy with Missing Data) contains two main function **pdlm()** (Predictive Dynamic Linear Model) and **dllm()** (Dynamic Local Level Model). The first one is to fit a dynamic linear model for forecasting the clinical positive cases (or other similar data) using lagged clinical and wastewater data. The second one is to fit a local level model for smoothing the noisy wastewater data. For more details, see **papers** here. ```{r setup} knitr::opts_chunk$set(echo = TRUE) library(dlmwwbe) data(wastewater) data(wastewaterhealthworker) ``` ## Dynamic Local Level Model First, we implement **dllm()** on the wastewater data collected between 2022 - 2024 in Twin Cities metro area in Minnesota, United States. For the detail of the data, see **papers**. There are two possible structures: 1. all wastewater data share a single latent state (*S = 'univariate'*). 2. Each wastewater data has its own latent sate (*S = 'kvariate'*). For a better model fitting, we encourage the use of the log transformation of the original wastewater data by setting the argument *log10 = TRUE*. This is because the data better approximates the normality assumption in practice. Other transformation might be necessary depending on the nature of the data. The **summary()** provides some information of the fitted model. Consider both wastewater data have their individual latent state. The average of the smoother is provided. ```{r} data_TC <- wastewater[wastewater$Code == "TC",] data_TC$SampleDate <- as.Date(data_TC$SampleDate) fit <- dllm( equal.state.var=FALSE, equal.obs.var=FALSE, log10=TRUE, data = data_TC, date = "SampleDate", obs_cols = c("ORFlab", "Nlab"), S = c('kvariate') ) summary(fit) plot(fit, type='smoother', conf.int = TRUE) ``` ## Predictive Dynamic Linear Model Next, we implement **pdlm()** on the clinical and wastewater data. Different number of lags are demonstrated. For a better model fitting, we encourage the use of the log transformation of the original wastewater data by setting the argument *log10 = TRUE* (and add $1$ for the positive count cases for a valid transformation). The **summary()** provides some information of the fitted model. Here, We consider $0$ and $2$ lags and plot them along with the observed data on its original scale. ```{r} data_TC <- wastewaterhealthworker[wastewaterhealthworker$Code == "TC",] data_TC$SampleDate <- as.Date(data_TC$SampleDate) fit <- pdlm( data=data_TC, formula=HealthWorkerCaseCount ~ WW.tuesday + WW.thursday, lags=0, log10=TRUE, date = NULL, equal.state.var = TRUE, equal.obs.var = FALSE, auto_init = TRUE, control = list(maxit = 100)) summary(fit) plot(fit, conf.int=TRUE) ``` ```{r} data_TC <- wastewaterhealthworker[wastewaterhealthworker$Code == "TC",] data_TC$SampleDate <- as.Date(data_TC$SampleDate) fit <- pdlm( data=data_TC, formula=HealthWorkerCaseCount ~ WW.tuesday + WW.thursday, lags=2, log10=TRUE, date = NULL, equal.state.var = FALSE, equal.obs.var = TRUE, auto_init = TRUE, control = list(maxit = 100)) summary(fit) plot(fit, conf.int=TRUE) ```