**EpiLPS** (Gressani et al. 2022) is the acronym
for **Epi**demiological modeling (tool) with
**L**aplacian-**P**-**S**plines.
It proposes a new Bayesian methodology for estimating (key)
epidemiological parameters. This vignette focuses on the estimation of
the time-varying reproduction number \(\mathcal{R}_t\), i.e. the average number of
secondary cases generated by an infectious agent at time \(t\); a key metric for assessing the
transmission dynamics of an infectious disease and a useful reference
for guiding interventions of governmental institutions in a public
health crisis. The EpiLPS project builds upon two strong pillars in the
statistical literature, namely Bayesian P-splines and Laplace
approximations, to deliver a fast and flexible methodology for inference
on \(\mathcal{R}_t\). EpiLPS requires
two (external) inputs: (1) a time series of incidence counts and (2) a
(discrete) serial interval distribution.

The underlying model for smoothing incidence counts is based on the negative binomial distribution to account for possible overdispersion in the data. EpiLPS has a two-phase engine for estimating \(\mathcal{R}_t\). First, Bayesian P-splines are used to smooth the epidemic curve and to compute estimates of the mean incidence count of the susceptible population at each day of the outbreak. Second, in the philosophy of Fraser (2007), the renewal equation is used as a bridge to link the estimated mean incidence and the reproduction number. As such, the second phase delivers a closed-form expression of \(\mathcal{R}_t\) as a function of the B-spline coefficients and the serial interval distribution.

Another key strength of EpiLPS is that two different strategies can
be used to estimate \(\mathcal{R}_t\).
The first approach called LPSMAP is completely sampling-free and fully
relies on Laplace approximations and maximum *a posteriori* (MAP)
computation of model hyperparameters for estimation. Routines for
Laplace approximations and B-splines evaluations are typically the ones
that are computationally most intensive and are therefore coded in C++
and integrated in R via the **Rcpp** package, making them
time irrelevant. The second approach is called LPSMALA
(Laplacian-P-splines with a Metropolis-adjusted Langevin algorithm) and
is fully stochastic. It samples the posterior of the model by using
Langevin diffusions in a Metropolis-within-Gibbs framework. Of course,
LPSMAP has a computational advantage over LPSMALA. Thanks to the
lightning fast implementation of Laplace approximations, LPSMAP
typically delivers estimates of \(\mathcal{R}_t\) in a fraction of a
second.

The package has a modular architecture with simple routines aiming at
an intuitive command input for users. Kernel routines (the machinery
responsible for the hard computational work behind EpiLPS) are not
directly surfaced to the user, i.e. they are not directly available in
the package documentation but can be accessed by typing
`ls(getNamespace("EpiLPS"), all.names = TRUE)`

in the R
console. All the routines starting with *“Ker”* are Kernel
routines. These routines include (among others) likelihood evaluations,
gradient and Hessian evaluations and posterior distribution
evaluations.

The `estimR()`

and `estimRmcmc()`

routines can
be used to estimate the time-varying reproduction number using
Laplacian-P-splines with the MAP approach and MALA approach,
respectively. The `epicurve()`

function can be used to obtain
a graphical representation of the epidemic curve (and its smoothed
version), while the `Idist()`

routine is meant to compute the
(discrete) distribution of a disease interval, for instance the serial
interval distribution to be used as an input in `estimR()`

.
Finally, there are some S3 methods available to summarize estimation
results and an `episim()`

routine to simulate time series of
incidence data based on different functional patterns of the
reproduction number over time.

The package has an associated website https://epilps.com/ that includes documentation and a Shiny application on COVID-19 data. The associated GitHub repository https://github.com/oswaldogressani/EpiLPS hosts the in-development version before the associated stable version is released on CRAN. To install the GitHub version of EpiLPS (with devtools) type the following lines in the R console:

To estimate \(\mathcal{R}_t\),
EpiLPS requires a (discrete) serial interval distribution, i.e. a
discrete probability distribution and a time series of incidence data.
The `Idist()`

function computes the probability density
function and probability mass function for a disease interval based on
the mean and standard deviation of the disease interval (expressed in
days). The code below is used to obtain a (discrete) serial interval
`si`

for a given mean and standard deviation. By default
these values are obtained by assuming a Gamma distribution for the
disease interval, but the Weibull or LogNormal distributions can also be
specified in the `dist=`

option.

```
si_spec <- Idist(mean = 2.7, sd = 1.3, dist = "gamma")
si <- si_spec$pvec
si
#> [1] 0.1683155677 0.3309269357 0.2641722153 0.1416603809 0.0609478350
#> [6] 0.0228109990 0.0077559604 0.0024585172 0.0007387518 0.0002128371
plot(si_spec, titlesize = 12)
```

By calling `plot()`

on the `si_spec`

object, a
plot of the serial interval is created with the discrete and continuous
distribution. On top of that, the `si_spec`

object also
returns the (fitted) parameters for the chosen distribution of the
disease interval that is also reflected in the plot title. Now, using
the above serial interval, we can call the `episim()`

routine
to generate a time series of incidence data (say for an epidemic lasting
40 days). The data generating process (DGP) requires the specification
of a functional form for the reproduction number (here we choose
`Rpattern = 5`

among the six available patterns). A renewal
equation model is used to generate the case incidence data based on a
Poisson or negative binomial process (to be chosen by the user). Full
details on the DGP can be found here https://doi.org/10.1371/journal.pcbi.1010618.s002.

```
set.seed(123)
datasim <- episim(si = si, Rpattern = 5, endepi = 40, dist = "negbin", overdisp = 15, plotsim = TRUE)
```

```
incidence <- datasim$y
incidence
#> [1] 10 6 16 26 25 26 39 28 40 19 12 6 3 6 4 5 5 7 12
#> [20] 13 26 28 44 85 120 110 151 219 326 262 255 328 139 200 129 141 135 209
#> [39] 241 296
```

By default, the number of cases at time \(t=1\) is fixed at 10. Specifying option
`plotsim = TRUE`

gives a summary plot of the generated data
and the right panel gives the true underlying functional form of the
reproduction number over the 40 days (as specified by
`Rpattern = 5`

). We now have a serial interval distribution
and a time series of incidence data that can be used as inputs in the
`estimR()`

and `estimRmcmcm()`

routines.

The `estimR()`

and `estimRmcmcm()`

are
relatively simple to use with only a few inputs and intuitive outputs.
Basically, both routines require a vector containing the incidence time
series, a discrete serial interval distribution, a number of B-splines
to be used in the basis (default is 30) and a vector of dates
(optional). Another added value of these routines is that they also
allow to compute the \(\mathcal{R}_t\)
estimates using the Cori et al. (2013) and Wallinga and Teunis
(2004) method, respectively. The code below defines an object called
`LPSfit`

from the `estimR()`

routine. Actually,
the latter object is a list with many different components summarizing
the results of the fit. Among this list, the `RLPS`

component
is of particular interest as it gives a summary of the estimated
reproduction number (point estimates and selected quantiles).

```
LPSfit <- estimR(incidence = incidence, si = si)
class(LPSfit)
#> [1] "Rt"
knitr::kable(tail(LPSfit$RLPS[,1:8]))
```

Time | R | Rsd | Rq0.025 | Rq0.05 | Rq0.25 | Rq0.50 | Rq0.75 | |
---|---|---|---|---|---|---|---|---|

35 | 35 | 0.6419727 | 0.0324669 | 0.5815008 | 0.5908240 | 0.6204838 | 0.6419727 | 0.6642058 |

36 | 36 | 0.7212346 | 0.0376810 | 0.6511724 | 0.6619592 | 0.6963118 | 0.7212346 | 0.7470496 |

37 | 37 | 0.9116863 | 0.0455660 | 0.8267645 | 0.8398637 | 0.8815202 | 0.9116863 | 0.9428847 |

38 | 38 | 1.2018582 | 0.0529034 | 1.1026540 | 1.1180325 | 1.1667500 | 1.2018582 | 1.2380228 |

39 | 39 | 1.4957960 | 0.0603576 | 1.3821874 | 1.3998527 | 1.4556826 | 1.4957960 | 1.5370147 |

40 | 40 | 1.6824906 | 0.0903072 | 1.5148264 | 1.5406092 | 1.6227950 | 1.6824906 | 1.7443822 |

The `estimR()`

routine generates an object of class
`Rt`

and there are two S3 methods associated with an object
of that class, namely a `summary()`

method and a
`plot()`

method. The former gives:

```
summary(LPSfit)
#> Estimation of the reproduction number with Laplacian-P-splines
#> --------------------------------------------------------------
#> Total number of days: 40
#> Routine time (seconds): 0.1
#> Method: Maximum a posteriori (MAP)
#> Hyperparam. optim method: Nelder-Mead
#> Hyperparam. optim convergence: TRUE
#> Mean reproduction number: 1.346
#> Min reproduction number: 0.312
#> Max reproduction number: 2.651
#> --------------------------------------------------------------
```

This table give basic summary statistics related to the LPS fit (note how fast the routine is). Calling the S3 plot method yields a plot of the estimated reproduction number and associated 95% credible interval.

The `estimRmcmcm()`

routine works similarly. By default it
draws 5000 MCMC samples and uses a burn-in of size 2000. Being a fully
stochastic approach, the latter routine is slower than
`estimR()`

.

```
summary(LPSfitmcmc)
#> Estimation of the reproduction number with Laplacian-P-splines
#> --------------------------------------------------------------
#> Total number of days: 40
#> Routine time (seconds): 1.72
#> Method: MCMC (with Langevin diffusion)
#> Hyperparam. optim method: Nelder-Mead
#> Hyperparam. optim convergence: TRUE
#> Mean reproduction number: 1.359
#> Min reproduction number: 0.314
#> Max reproduction number: 2.715
#> --------------------------------------------------------------
```

The same exercise, but adding the Cori et al. (2013) EpiEstim
fit. By specifying the `addfit = "Cori"`

option in the S3
plot method, we can overlay both estimates.

```
LPSfit2 <- estimR(incidence = incidence, si = si, CoriR = TRUE)
knitr::kable(tail(LPSfit2$RCori[,1:8]))
```

t_start | t_end | Mean(R) | Std(R) | Quantile.0.025(R) | Quantile.0.05(R) | Quantile.0.25(R) | Median(R) | |
---|---|---|---|---|---|---|---|---|

28 | 29 | 35 | 1.0269038 | 0.0253576 | 0.9777996 | 0.9855533 | 1.0096885 | 1.0266951 |

29 | 30 | 36 | 0.8799678 | 0.0230693 | 0.8353282 | 0.8423690 | 0.8642998 | 0.8797662 |

30 | 31 | 37 | 0.8153488 | 0.0223740 | 0.7720809 | 0.7788991 | 0.8001483 | 0.8151442 |

31 | 32 | 38 | 0.8335788 | 0.0232811 | 0.7885677 | 0.7956581 | 0.8177601 | 0.8333621 |

32 | 33 | 39 | 0.8342298 | 0.0241325 | 0.7875957 | 0.7949362 | 0.8178283 | 0.8339971 |

33 | 34 | 40 | 1.0094248 | 0.0274527 | 0.9563291 | 0.9646975 | 0.9907751 | 1.0091760 |

Of course, the estimated reproduction number can also be potted by
extracting values from the `LPSfit2`

object in a traditional
way as shown below. The interested reader can consult Gressani et
al. (2022) for a detailed explanantion of the main differences
between EpiEstim and EpiLPS.

```
tt <- seq(8, 40, by = 1)
Rtrue <- sapply(tt, datasim$Rtrue)
plot(tt, Rtrue, type = "l", xlab = "Time", ylab = "R", ylim = c(0,4), lwd = 2)
lines(tt, LPSfit2$RLPS$R[-(1:7)], col = "red", lwd = 2)
lines(tt, LPSfit2$RCori$`Mean(R)`, col = "blue", lwd = 2)
legend("topright", col = c("black","red","blue"),
c("True R","EpiLPS","EpiEstim"), bty = "n", lty = c(1,1,1))
```

EpiLPS uses **ggplot2** to generate beautiful graphics
and plots of the estimated reproduction number can be customized in
various ways.

```
gridExtra::grid.arrange(
plot(LPSfit, col = "firebrick", legendpos = "top", cicol = "orange"),
plot(LPSfit, col = "forestgreen", legendpos = "none", cicol = "green",
theme = "light", title = "Reproduction number"),
plot(LPSfit, col = "darkblue", legendpos = "none", cicol = "orchid", theme = "classic"),
plot(LPSfit, col = "white", legendpos = "none", cicol = "gray",
theme = "dark"), nrow = 2, ncol = 2)
```

To illustrate EpiLPS on real data, we investigate the daily incidence
of Zika virus disease in Girardot, Colombia from the
**outbreaks** package. Incidence data is available from
October 2015 to January 2016. First, the data is loaded and the epidemic
curve is visualized with the `epicurve()`

routine.

```
# Loading the data
data("zika2015")
zika <- zika2015
plotIncidence <- epicurve(zika$incidence, dates = zika$dates, datelab = "14d", title = "Zika incidence", xtickangle = 70)
plotIncidence
```

A serial interval distribution of mean 7 days (SD=1.5 days) is
specified and the `estimR()`

routine is used to estimate the
reproduction number.

```
# Specification of the serial interval
si <- Idist(mean = 7, sd = 1.5)
siplot <- plot(si, titlesize = 11)
epifit <- estimR(zika$incidence, dates = zika$dates, si = si$pvec)
summary(epifit)
#> Estimation of the reproduction number with Laplacian-P-splines
#> --------------------------------------------------------------
#> Total number of days: 93
#> Routine time (seconds): 0.06
#> Method: Maximum a posteriori (MAP)
#> Hyperparam. optim method: Nelder-Mead
#> Hyperparam. optim convergence: TRUE
#> Mean reproduction number: 1.355
#> Min reproduction number: 0.177
#> Max reproduction number: 5.047
#> --------------------------------------------------------------
```

Next, the estimation results are summarized into a single plot.

```
# Plot the smoothed epidemic curve
plotsmooth <- epicurve(zika$incidence, dates = zika$dates, datelab = "14d",
smooth = epifit, smoothcol = "orange",
title = "Zika incidence (smoothed)",
xtickangle = 70)
# Plot of the estimated reproduction number
Rplot <- plot(epifit, datelab = "7d", xtickangle = 70, legendpos = "none", col = "forestgreen")
# Show all plots together
gridExtra::grid.arrange(plotIncidence, plotsmooth, siplot, Rplot, nrow = 2, ncol = 2)
```

```
data(Measles1861)
measlesDAT <- Measles1861
measles_incid <- measlesDAT$incidence
measles_si <- measlesDAT$si_distr
epifit_measles <- estimR(measles_incid, si = measles_si, CoriR = T)
epicurve_measles<- epicurve(measles_incid, datelab = "1d", title = "Measles, Hagelloch, Germany, 1861",
col = "lightblue3", smooth = epifit_measles, smoothcol = "dodgerblue4")
Rplot_measles <- plot(epifit_measles, timecut = 6, legendpos = "none")
Rplot_measles2 <- plot(epifit_measles, addfit = "Cori", timecut = 6, legendpos = "top")
gridExtra::grid.arrange(epicurve_measles, Rplot_measles, Rplot_measles2, nrow = 2, ncol = 2)
```

```
data(Flu1918)
fluDAT <- Flu1918
flu_incid <- fluDAT$incidence
flu_si <- fluDAT$si_distr[-1]
epifit_flu <- estimR(flu_incid, si = flu_si, CoriR = T)
epicurve_flu <- epicurve(flu_incid, datelab = "7d", title = "Influenza, Baltimore, 1918",
col = "orange", smooth = epifit_flu, smoothcol = "firebrick")
Rplot_flu <- plot(epifit_flu, legendpos = "none")
Rplot_flu2 <- plot(epifit_flu, addfit = "Cori", legendpos = "top")
siplot_flu <- plot(Idist(probs = flu_si), barcol = "indianred1")
gridExtra::grid.arrange(epicurve_flu, Rplot_flu, Rplot_flu2,
siplot_flu, nrow = 2, ncol = 2)
```

```
data("SARS2003")
sarsDAT <- SARS2003
sars_incid <- sarsDAT$incidence
sars_si <- sarsDAT$si_distr[-1]
epifit_sars <- estimR(sars_incid, si = sars_si, CoriR = T)
epicurve_sars <- epicurve(sars_incid, datelab = "7d", title = "SARS, Hong Kong, 2003",
col = "ivory4", smooth = epifit_sars, smoothcol = "darkviolet")
Rplot_sars <- plot(epifit_sars, legendpos = "none")
Rplot_sars2 <- plot(epifit_sars, addfit = "Cori", legendpos = "top")
gridExtra::grid.arrange(epicurve_sars, Rplot_sars, Rplot_sars2, nrow = 2, ncol = 2)
```

Assessing (statistical) performance is a crucial part of model
validation, especially when a new methodology is presented. The
`perfRestim()`

routine can be used to measure the performance
of the `estimR()`

and `estimRmcmc()`

routines for
estimation of the time-varying reproduction number. After simulating a
specific number of epidemics via the `episim()`

function, the
Bias, MSE, coverage probability (CP) and width of 90% and 95% credible
intervals for \(R_t\) are computed. The
latter performance metrics are available for each time point \(t\) of the epidemic and the reported values
correspond to the average over days \(t=8,\dots,T\), where \(T\) is the last day of the simulated
epidemic. The data generating mechanism (see Gressani et
al. 2022) allows to specify among an influenza, a SARSCoV- 1 and a
MERS-CoV like serial interval. Below, we simulate 100 epidemics
(default) of 60 days with a MERS-CoV like serial interval, where the
true reproduction number has a wiggly pattern as specified by scenario 5
in the routine option [Note: This actually corresponds to scenario 9 in
Gressani et al. (2022)].

```
perfcheck <- perfRestim(si = "mers", scenario = 5, days = 60, seed = 1905, overdisp = 50)
perfcheck$LPS
```

Performance metrics (and associated trajectories of estimated \(R_t\)) can be accessed as follows.

Gressani, O., Wallinga, J., Althaus, C. L., Hens, N. and Faes, C.
(2022). EpiLPS: A fast and flexible Bayesian tool for estimation of the
time-varying reproduction number. *PLoS Comput Biol*
**18**(10): e1010618. https://doi.org/10.1371/journal.pcbi.1010618

Fraser C (2007) Estimating Individual and Household Reproduction
Numbers in an Emerging Epidemic. *PLoS ONE*
**2**(8): e758. https://doi.org/10.1371/journal.pone.0000758

Cori, A., Ferguson, N.M., Fraser, C., Cauchemez, S. (2013) A new
framework and software to estimate time-varying reproduction numbers
during epidemics, *American Journal of Epidemiology*,
**178**(9), 1505–1512. https://doi.org/10.1093/aje/kwt133

Wallinga, J., & Teunis, P. (2004). Different epidemic curves for
severe acute respiratory syndrome reveal similar impacts of control
measures. *American Journal of Epidemiology*,
**160**(6), 509-516.