The aim of the function `remstimate()`

is to find the set
of model parameters that optimizes either: (1) the likelihood function
given the observed data or (2) the posterior probability of the
parameters given the prior information on their distribution and the
likelihood of the data. Furthermore, the likelihood function may differ
depending on the modeling framework used, be it tie-oriented or
actor-oriented:

the

**tie-oriented**framework, which models the occurrence of dyadic events as realization of ties along with their waiting time (Butts 2008);the

**actor-oriented**framework, which models the occurrence of the dyadic events as a two-steps process (Stadtfeld and Block 2017):- in the first step, it models the propensity of any actor to initiate any form of interaction as a sender and the waiting time to the next interaction (sender rate model);
- in the second step, it models the probability of the sender at 1. choosing the receiver of its future interaction (receiver choice model).

The mandatory input arguments of `remstimate()`

are:

`reh`

, which is a`remify`

object of the processed relational event history (processing function`remify`

available with the package`remify`

,`install.packages("remify")`

);`stats`

, a`remstats`

object (the`remstats`

function for calculating network statistics is available with the package`remstats`

,`install.packages("remstats")`

). A linear predictor for the model is specified via a formula object (for the actor-oriented framework, up to two linear predictors can be specified) and supplied to the function`remstats::remstats()`

along with the`remify`

object. When`attr(reh,"model")`

is`"tie"`

,`stats`

consists of only one array of statistics; if`attr(reh,"model")`

is`"actor"`

,`stats`

consists of a list that can contain up to two arrays named`"sender_stats"`

and`"receiver_stats"`

. The first array contains the statistics for the sender model (event rate model), the second array for the receiver model (multinomial choice model). Furthermore, it is possible to calculate the statistics for only the sender model or only the receiver model, by supplying the formula object only for one of the two models. For more details on the use of`remstats::remstats()`

, see`help(topic=remstats,package=remstats)`

.

Along with the two mandatory arguments, the argument
`method`

refers to the optimization routine to use for the
estimation of the model parameters and its default value is
`"MLE"`

. Methods available in `remstimate`

are:
**Maximum Likelihood Estimation** (`"MLE"`

),
**Adaptive Gradient Descent** (`"GDADAMAX"`

),
**Bayesian Sampling Importance Resampling**
(`"BSIR"`

), **Hamiltonian Monte Carlo**
(`"HMC"`

).

In order to optimize model parameters, the available approaches
resort to either the Frequentist theory or the Bayesian theory. The
`remstimate`

package provides several optimization methods to
estimate the model parameters:

- two Frequentist approaches such as Maximum Likelihood Estimation
(
`"MLE"`

) and Adaptive Gradient Descent (`"GDADAMAX"`

) which are respectively second-order and first-order optimization algorithms; - two Bayesian approaches such as Bayesian Sampling Importance
Resampling (
`"BSIR"`

) and Hamiltonian Monte Carlo (`"HMC"`

).

To provide a concise overview of the two approaches, consider a time-ordered sequence of \(M\) relational events, \(E_{t_M}=(e_1,\ldots,e_M)\), the array of statistics (explanatory variables) \(X\), and \(\boldsymbol{\beta}\) the vector of model parameters describing the effect of the explanatory variables, which we want to estimate. The explanation that follows below is valid for both tie-oriented and actor-oriented modeling frameworks.

The aim of the *Frequentist* approaches is to find the set of
parameters \(\boldsymbol{\hat{\beta}}\)
that maximizes the value of the likelihood function \(\mathscr{L}(\boldsymbol{\beta};
E_{t_M},X)\), that is

\[ \boldsymbol{\hat{\beta}}=\mathop{\mathrm{arg\,max}}_{\boldsymbol{\beta}}\{\mathscr{L}(\boldsymbol{\beta};E_{t_M},X)\} \]

Whereas, the aim of the *Bayesian* approaches is to find the
set of parameters \(\boldsymbol{\hat{\beta}}\) that maximizes
the posterior probability of the model which is proportional to the
likelihood of the observed data and the prior distribution assumed over
the model parameters,

\[p(\boldsymbol{\beta}|E_{t_M},X) \propto \mathscr{L}(\boldsymbol{\beta}; E_{t_M},X) p(\boldsymbol{\beta})\]

where \(p(\boldsymbol{\beta})\) is the prior distribution of the model parameters which, for instance, can be assumed as a multivariate normal distribution,

\[ \boldsymbol{\beta} \sim \mathcal{N}(\boldsymbol{\mu_{0}},\Sigma_{0}) \]

with parameters \((\boldsymbol{\mu_{0}},\Sigma_{0})\) summarizing the prior information that the researcher may have on the distributon of \(\boldsymbol{\beta}\).

`remstimate`

package)Before starting, we want to first load the `remstimate`

package. This will laod in turn `remify`

and
`remstats`

, which we need respectively for processing the
relational event history and for calculating/processing the statistics
specified in the model:

`library(remstimate)`

In this tutorial, we are going to use the main functions from
`remify`

and `remstats`

by setting their arguments
to the default values. However, we suggest the user to read through the
documentation of the two packages and their vignettes in order to get
familiar with their additional functionalities (e.g., possibility of
defining time-varying risk set, calculating statistics for a specific
time window, and many others).

For the tie-oriented modeling, we refer to the seminal paper by Butts (2008), in which the author introduces the likelihood function of a relational event model (REM). Relational events are modeled in a tie-oriented approach along with their waiting time (if measured). When the time variable is not available, then the model reduces to the Cox proportional-hazard survival model (Cox 1972).

Consider a time-ordered sequence of \(M\) relational events, \(E_{t_M}=(e_1,\ldots,e_M)\), where each event \(e_{m}\) in the sequence is described by the 4-tuple \((s_{m},r_{m},c_{m},t_{m})\), respectively sender, receiver, type and time of the event. Furthermore,

- \(N\) is the number of actors in the network. For simplicity, we assume here that all actors in the network can be the sender or the receiver of a relational event;
- \(C\) is the number of event types, which may describe the sentiment of an interaction (e.g., a praise to a colleague, or a conflict between countries). We set \(C=1\) for simplicity, which also means that we work with events without sentiment (at least not available in the data);
- \(P\) is the number of sufficient statistics (explanatory variables);

The likelihood function that models a relational event sequence with a tie-oriented approach is, \[ \mathscr{L}(\boldsymbol{\beta}; E_{t_M},X)= \prod_{m=1}^{M}{\Bigg[\lambda(e_{m},t_{m},\boldsymbol{\beta})\prod_{e\in \mathcal{R}}^{}{\exp{\left\lbrace-\lambda(e,t_{m},\boldsymbol{\beta})\left(t_m-t_{m-1}\right)\right\rbrace} }}\Bigg] \]

where:

- \(\lambda(e,t_{m},\boldsymbol{\beta})\) is
the rate of occurrence of the event \(e\) at time \(t_{m}\). The event rate describes the
istantaneous probability of occurrence of an event at a specific time
point and it is modeled as \(\lambda(e,t_{m},\boldsymbol{\beta}) =
\exp{\left\lbrace
\boldsymbol{\beta}^{T}X_{[m,e,.]}\right\rbrace}\) where:
- \(\boldsymbol{\beta}\) is the vector of parameters of interest. Such parameters describe the effect that the sufficient statistics (explanatory variables) have on the event rate;
- \(\boldsymbol{\beta}^{T}X_{[m,e,.]} =
\sum_{p=1}^{P}{\beta_p X_{[m,e,p]}}\) is the linear predictor of
the event \(e\) at time \(t_{m}\). The object \(X\) is a three dimensional array with
number of rows equal to the number of unique time points (or events) in
the sequence (see
`vignette(package="remstats")`

for more information about statistics calculated*per unique time point*or*per observed event*), number of columns equal to number of dyadic events (\(D\)), (see`vignette(topic="remify",package="remify")`

for more information on how to quantify the number of dyadic events), number of slices equal to the number of variables in the linear predictor (\(P\)).

- \(e_m\) refers to the event occurred at time \(t_m\) and \(e\) refers to any event at risk at time \(t_m\);
- \(\mathcal{R}\) describes the set
of events at risk at each time point (including also the occurring
event). In this case, the risk set is assumed to be the
*full*risk set (see`vignette(topic="riskset",package="remify")`

for more information on alternative definitions of the risk set), which means that all the possible dyadic events are at risk at any time point; - \((t_{m}-t_{m-1})\) is the waiting time between two subsequent events.

If the time of occurrence of the events is not available and we know only their order of occurrence, then the likelihood function reduces to the Cox proportional-hazard survival model (Cox 1972),

\[ \mathscr{L}(\boldsymbol{\beta}; E_{t_M},X)= \prod_{m=1}^{M}{\Bigg[\frac{\lambda(e_{m},t_{m},\boldsymbol{\beta})}{\sum_{e\in \mathcal{R}}^{}{\lambda(e,t_{m},\boldsymbol{\beta})}}}\Bigg] \]

In order to get started with the optimization methods available in
`remstimate`

, we consider the data `tie_data`

,
that is a list containing a simulated relational event sequence where
events were generated by following a tie-oriented process. We are going
to model the event rate \(\lambda\) of
any event event \(e\) at risk at time
\(t_{m}\) as:

\[\begin{align}\lambda(e,t_{m},\boldsymbol{\beta}) = \exp{\left\lbrace\beta_{intercept} + \beta_{\text{indegreeSender}}\text{indegreeSender}(s_e,t_{m}) + \\ +\beta_{\text{inertia}}\text{inertia}(s_e,r_e,t_{m}) + \beta_{\text{reciprocity}}\text{reciprocity}(s_e,r_e,t_{m})\right\rbrace}\end{align}\]

Furthermore, we know that the *true* parameters quantifying
the effect of the *statistics* (name in subscript next to \(\beta\)) and used in the generation of the
event sequence are: \[\begin{bmatrix}
\beta_{intercept} \\ \beta_{\text{indegreeSender}} \\
\beta_{\text{inertia}} \\ \beta_{\text{reciprocity}} \end{bmatrix} =
\begin{bmatrix} -5.0 \\ 0.01 \\ -0.1 \\ 0.03\end{bmatrix}\] The
parameters are also available as object within the list,
`tie_data$true.pars`

.

```
# setting `ncores` to 1 (the user can change this parameter)
ncores <- 1L
# loading data
data(tie_data)
# true parameters' values
tie_data$true.pars
```

```
## intercept indegreeSender inertia reciprocity
## -5.00 0.01 -0.10 0.03
```

```
# processing the event sequence with 'remify'
tie_reh <- remify::remify(edgelist = tie_data$edgelist, model = "tie")
# summary of the (processed) relational event network
summary(tie_reh)
```

```
## Relational Event Network
## (processed for tie-oriented modeling):
## > events = 100
## > actors = 5
## > riskset = full
## > directed = TRUE
## > ordinal = FALSE
## > weighted = FALSE
## > time length ~ 807
## > interevent time
## >> minimum ~ 0.1832
## >> maximum ~ 63.6192
```

`remstimate()`

in 3 stepsThe estimation of a model can be summarized in three steps:

- First, we define the linear predictor with the variables of
interest, using the statistics available within
`remstats`

(statistics calculated by the user can be also supplied to`remstats::remstats()`

).

```
# specifying linear predictor (with `remstats`) using a 'formula'
tie_model <- ~ 1 + remstats::indegreeSender() +
remstats::inertia() + remstats::reciprocity()
```

- Second, we calculate the statistics defined in the linear predictor
with the function
`remstats::remstats()`

from the`remstats`

package.

```
# calculating statistics (with `remstats`)
tie_stats <- remstats::remstats(reh = tie_reh, tie_effects = tie_model)
# the 'tomstats' 'remstats' object
tie_stats
```

```
## Relational Event Network Statistics
## > Model: tie-oriented
## > Computation method: per time point
## > Dimensions: 100 time points x 20 dyads x 4 statistics
## > Statistics:
## >> 1: baseline
## >> 2: indegreeSender
## >> 3: inertia
## >> 4: reciprocity
```

- Finally, we are ready to run any of the optimization methods with
the function
`remstimate::remstimate()`

.

```
# for example the method "MLE"
remstimate::remstimate(reh = tie_reh,
stats = tie_stats,
method = "MLE",
ncores = ncores)
```

```
## Relational Event Model (tie oriented)
##
## Coefficients:
##
## baseline indegreeSender inertia reciprocity
## -4.91045403 0.04349023 -0.20150573 -0.05213701
##
## Null deviance: 1216.739
## Residual deviance: 1210.625
## AIC: 1218.625 AICC: 1219.046 BIC: 1229.045
```

In the sections below, we show the estimation of the parameters using
all the methods available and we also show the usage and output of the
methods available for a `remstimate`

object.

```
tie_mle <- remstimate::remstimate(reh = tie_reh,
stats = tie_stats,
ncores = ncores,
method = "MLE",
WAIC = TRUE, # setting WAIC computation to TRUE
nsimWAIC = 100) # number of draws for the computation of the WAIC set to 100
```

```
# printing the 'remstimate' object
tie_mle
```

```
## Relational Event Model (tie oriented)
##
## Coefficients:
##
## baseline indegreeSender inertia reciprocity
## -4.91045403 0.04349023 -0.20150573 -0.05213701
##
## Null deviance: 1216.739
## Residual deviance: 1210.625
## AIC: 1218.625 AICC: 1219.046 BIC: 1229.045 WAIC: 1219.022
```

```
# summary of the 'remstimate' object
summary(tie_mle)
```

```
## Relational Event Model (tie oriented)
##
## Call:
## ~1 + remstats::indegreeSender() + remstats::inertia() + remstats::reciprocity()
##
##
## Coefficients (MLE with interval likelihood):
##
## Estimate Std. Err z value Pr(>|z|) Pr(=0)
## baseline -4.910454 0.187555 -26.181372 0.0000 <2e-16
## indegreeSender 0.043490 0.036449 1.193170 0.2328 0.8307
## inertia -0.201506 0.088154 -2.285831 0.0223 0.4231
## reciprocity -0.052137 0.098237 -0.530728 0.5956 0.8968
## Null deviance: 1216.739 on 100 degrees of freedom
## Residual deviance: 1210.625 on 96 degrees of freedom
## Chi-square: 6.11449 on 4 degrees of freedom, asymptotic p-value 0.1907597
## AIC: 1218.625 AICC: 1219.046 BIC: 1229.045 WAIC: 1219.022
```

Under the column named \(Pr(>|z|)\), the usual test on the
`z value`

for each parameter. As to the column named \(Pr(=0|x)\), an approximation of the
posterior probability of each parameter being equal to 0.

```
# aic
aic(tie_mle)
```

`## [1] 1218.625`

```
# aicc
aicc(tie_mle)
```

`## [1] 1219.046`

```
# bic
bic(tie_mle)
```

`## [1] 1229.045`

```
#waic
waic(tie_mle)
```

`## [1] 1219.022`

```
# diagnostics
tie_mle_diagnostics <- diagnostics(object = tie_mle, reh = tie_reh, stats = tie_stats)
```

```
# plot
plot(x = tie_mle, reh = tie_reh, diagnostics = tie_mle_diagnostics)
```

```
tie_gd <- remstimate::remstimate(reh = tie_reh,
stats = tie_stats,
ncores = ncores,
method = "GDADAMAX",
epochs = 200L, # number of iterations for the Gradient-Descent algorithm
WAIC = TRUE, # setting WAIC computation to TRUE
nsimWAIC = 100) # number of draws for the computation of the WAIC set to 100
# print
tie_gd
```

```
## Relational Event Model (tie oriented)
##
## Coefficients:
##
## baseline indegreeSender inertia reciprocity
## -4.95818977 0.03592763 -0.15966468 -0.04526381
##
## Null deviance: 1216.739
## Residual deviance: 1210.901
## AIC: 1218.901 AICC: 1219.322 BIC: 1229.322 WAIC: 1219.431
```

```
# diagnostics
tie_gd_diagnostics <- diagnostics(object = tie_gd, reh = tie_reh, stats = tie_stats)
# plot
# plot(x = tie_gd, reh = tie_reh, diagnostics = tie_gd_diagnostics) # uncomment to use the plot function
```

```
library(mvnfast) # loading package for fast simulation from a multivariate Student t distribution
priormvt <- mvnfast::dmvt # defining which distribution we want to use from the 'mvnfast' package
tie_bsir <- remstimate::remstimate(reh = tie_reh,
stats = tie_stats,
ncores = ncores,
method = "BSIR",
nsim = 200L, # 200 draws from the posterior distribution
prior = priormvt, # defining prior here, prior parameters follow below
mu = rep(0,dim(tie_stats)[3]), # prior mu value
sigma = diag(dim(tie_stats)[3])*1.5, # prior sigma value
df = 1, # prior df value
log = TRUE, # requiring log density values from the prior,
seed = 23029, # set a seed only for reproducibility purposes
WAIC = TRUE, # setting WAIC computation to TRUE
nsimWAIC = 100 # number of draws for the computation of the WAIC set to 100
)
# summary
summary(tie_bsir)
```

```
## Relational Event Model (tie oriented)
##
## Call:
## ~1 + remstats::indegreeSender() + remstats::inertia() + remstats::reciprocity()
##
##
## Posterior Modes (BSIR with interval likelihood):
##
## Post.Mode Post.SD Q2.5% Q50% Q97.5% Pr(=0|x)
## baseline -4.854046 0.183316 -5.233761 -4.870497 -4.5387 <2e-16
## indegreeSender 0.056785 0.033818 -0.014111 0.037372 0.0985 0.7095
## inertia -0.247864 0.084423 -0.363385 -0.198756 -0.0552 0.1184
## reciprocity -0.082518 0.095099 -0.227214 -0.044827 0.1031 0.8728
## Log posterior: -615.9479 WAIC: 1219.636
```

```
# diagnostics
tie_bsir_diagnostics <- diagnostics(object = tie_bsir, reh = tie_reh, stats = tie_stats)
# plot
# plot(x = tie_bsir, reh = tie_reh, diagnostics = tie_bsir_diagnostics) # uncomment to use the plot function
```

```
tie_hmc <- remstimate::remstimate(reh = tie_reh,
stats = tie_stats,
method = "HMC",
ncores = ncores,
nsim = 200L, # 200 draws to generate per each chain
nchains = 4L, # 4 chains to generate
burnin = 200L, # burnin length is 200
thin = 2L, # thinning size set to 2 (the final length of the chains will be 100)
seed = 23029, # set a seed only for reproducibility purposes
WAIC = TRUE, # setting WAIC computation to TRUE
nsimWAIC = 100 # number of draws for the computation of the WAIC set to 100
)
# summary
summary(tie_hmc)
```

```
## Relational Event Model (tie oriented)
##
## Call:
## ~1 + remstats::indegreeSender() + remstats::inertia() + remstats::reciprocity()
##
##
## Posterior Modes (HMC with interval likelihood):
##
## Post.Mode Post.SD Q2.5% Q50% Q97.5%
## baseline -4.89176157 0.10099429 -5.13676387 -4.91842709 -4.7228
## indegreeSender 0.04217948 0.02128160 -0.00011272 0.04485975 0.0848
## inertia -0.20584323 0.05187471 -0.29431758 -0.19863423 -0.0960
## reciprocity -0.05412095 0.05676166 -0.15335677 -0.05330342 0.0670
## Pr(=0|x)
## baseline < 2.2e-16
## indegreeSender 0.583822
## inertia 0.003795
## reciprocity 0.863895
## Log posterior: -605.3315 WAIC: 1213.595
```

`Q2.5%`

, `Q50%`

and `Q97.5%`

are
respectively the percentile 2.5, the median (50th percentile) and the
percentile 97.5.

```
# diagnostics
tie_hmc_diagnostics <- diagnostics(object = tie_hmc, reh = tie_reh, stats = tie_stats)
# plot (histograms and trace plot have highest posterior density intervals dashed lines in blue and posterior estimate in red)
plot(x = tie_hmc, reh = tie_reh, diagnostics = tie_hmc_diagnostics)
```