The theory of the online update algorithms

Dawid Kałędkowski



Problem of sport matchups fits into subject of paired comparison modeling and choice modeling. Estimating player skills is equivalent to estimating preference of choice between two alternatives. Just as one product is more preferred over another to buy, similarly, better player is more preferred to win over worst. As player/event and alternative/experiment can be used interchangeably, for consistency with package name, sport nomenclature is adapted.

Bayesian update rule

Methods implemented in a sport package performs similarly, as all using Bayesian Approximation Method. Algorithms works as follows:

At some moment player i competes with player j while both have prior \(R_i\) and \(R_j\) ratings. Prior to event, probability that player i win over player j is \(\hat{Y_{ij}}\). After event occurs, true result \(Y_{ij}\) is observed, and initial believes about rating is changed \(R_i^{'} \leftarrow R_j\) according to the prediction error \((Y_{ij} - \hat{Y_{ij}} )\) and some constant \(K\). Updates are summed as player can compete with more than one player in particular event.

\[\large R_i^{'} \leftarrow R_i + \sum_{j \neq i}{ K * (Y_{ij} - \hat{Y_{ij}}}) \quad (1)\]


Outcome probability function is based on extended Bradley-Terry model where \(\pi_i = \frac{R_i}{RD_{ij}}\) (\(RD\) stands for rating deviation). However, expected outcome functions slightly differs in details, which can be noticed in formulas presented further in this document.

For multi-player matchups where output is a ranking, sport package uses the same data transformation as in exploded logit - ranking is then transformed to a combination of all possible pairs competing within same event, which is reflected in above formula where update for \(i\)th player is \(\Omega_{i} = \sum_{j \neq i} \Omega_{ij}\).

Players nested within team

In some cases individual results are not observed when individuals (players) competes in the teams. In team sports, results are reported on the team level and players contribution is implicit, so that \(Y_{ij}\) refers only to the teams. This means that update rule described in (1) impacts directly team aggregated rating, and players true abilities need to be estimated in another step.

Consider that \(t\)th team has \(k_t\) players with ratings \(N(R_{ti}, RD^2_{ti})\). We assume that team abilities is sum of players skills weighted by known \(s_{ti}\), so:

\[R_t = \sum_{i=1}^{k_t}R_{ti} * s_{ti}\]

\[RD_t^2 = \sum_{i=1}^{k_t}RD_{ti}^2 * s_{ti}\]

In above formula \(s_{ti}\) denotes knows measurable share in team total efforts - for example share of time spend on the field by player \(ti\) in total time.

Depending on algorithm type, update (\(\Omega\), \(\Delta\)) are computed as usual on the team level, and then parameters change is distributed proportionally by to the players.

\[R_{ti}^{'} \leftarrow R_{ti} * \Omega_i * s_{ti} \frac{RD_{ti}^2}{RD_t^2}\] \[RD_{ti}^{'} \leftarrow RD_{ti} * (1 - \Delta * s_{ti} \frac{RD_{ti}^2}{RD_t^2})\]

Glicko rating system

Glicko is the first bayesian online update algorithm incorporating rating volatility to rating and outcome computation. Glicko system is not balanced, and sum of rating changes across all players are not zero. In one 2-players event, reward of player i differs from reward of player i as it depends on their individual ratings deviation. Rating values oscillates around \(R \sim 1500\) with max deviation \(rd \leq 350\).

##        A        B        C        D 
## 1464.297 1396.039 1606.521 1674.836

Output probability and update formulas looks as follows:

\[\hat{Y_{ij}} = P(X_i>X_j) = \frac{1} { 1 + 10^{-g(RD_{ij}) * (R_i-R_j)/400}}\]

\[{R'}_i \leftarrow R_i + \frac{1}{\frac{1}{{RD}^2_{i}} + \frac{1}{d^2_i}} * \sum_{j \neq i}{g(RD_j) * (Y_{ij} - \hat{Y_{ij}})}\]

\[{RD'}_i \leftarrow \sqrt{(\frac{1}{{RD}^2_{i}} + \frac{1}{d^2_i}})^{-1}\]

For more details read Mark E. Glickman (1999).

Glicko2 rating system

Glicko2 improved predecessor by adding volatile parameter \(\sigma_i\) which increase/decrease rating deviation in periods when player performance differs from expected. Sigma is estimated iteratively using Illinois algorithm, which converges quickly not affecting computation time. Ratings in Glicko2 are scaled (\(\mu = \frac{R}{c}\), \(\phi = \frac{RD}{c}\)) but final output remains consistent with Glicko and values revolves around \(R\sim1500\) with max deviation \(RD \leq 350\).

##        A        B        C        D 
## 1468.919 1396.836 1606.055 1601.161

Output probability and update formulas looks as follows:

\[ \hat{Y_{ij}} = \frac{1}{1 + e^{-g(\phi_{ij})*(\mu_i - \mu_j)} }\]

\[ {\phi'}_i \leftarrow \frac{1}{\sqrt{ \frac{1}{ { {\phi_i}^2 + {\sigma'_i}^2}} + \frac{1}{v} }}\]

\[ {\mu'_i} \leftarrow \mu_i + {\phi'}_i * \sum_{j \neq i}{g(\phi_j) * (Y_{ij} - \hat{Y_{ij}})} \]

For more details read Mark E. Glickman (2013)

Bayesian Bradley Terry

The fastest algorithm with simple formula. Original BT formula lacks variance parameter, and this method incorporates rating deviation into model (as other models). BBT also prevents against fast RD decline to zero using gamma.

\[\hat{Y_{ij}} = P(X_i>X_j) = \frac{e^{R_i/c_{ij}}} {e^{R_i / c_{ij}} + e^{R_j / c_{ij}}} \]

\[{R'}_i \leftarrow R_i + \sum_i{\frac{RD_i^2}{c_{ij}}*(Y_{ij} - \hat{Y_{ij}})}\]

\[{RD'}_i \leftarrow RD_i * (1 - \sum_{j \neq i}{ \gamma_j * (\frac{RD_i}{c_{ij}})^2 * \hat{Y_{ij}} \hat{Y_{ji}}} )\]

##        a        b        c        d 
## 23.98191 23.20410 27.05331 29.30904

For more details read Ruby C. Weng and Chih-Jen Lin (2011)

Dynamic Bayesian Logit

Following algorithm gives some advantages over mentioned rating systems, allowing to add other factors to the model. Algorithm perform better in disciples where other variables can make a difference in result eg. home field advantage.

DBL implements Extended Kalman Filter learning rule, and allows to estimate multiple parameters in addition to player ratings. DBL is a dynamic logit extended to usage in pairwise comparisons by modeling differences of parameters instead of parameters itself. For sake of consistency with other algorithms \(R\) and \(RD\) is used instead of \(\beta\) and \(\sigma\) (in EKF \(\omega\) and \(\Sigma\)). In DBL \(R\) denotes weights/parameters not only player ratings and \(RD\) is parameter standard deviation.

\[\hat{Y_{ij}} = \frac{e^{-K(s_t) (\pi_{it} - \pi_{jt})}} {1 + e^{-K(s_t) (\pi_{it}-\pi_{jt})}}\]

where: * \(\pi_{it} = R_i x_{it}\) - \(i\)-th player strength in \(t\)-th matchup composed of player raw skills and other factors affecting his abilities.

Parameters for player i competing with player j are estimated using EKF update rule. \[\hat{R}^{'}_{i} \leftarrow \hat{R}_{i} + \frac{RD^2_{i}} {1 + \hat{Y_{ij}} (1 - \hat{Y_{ij}})} * x_i (Y_{ij} - \hat{Y_{ij}})\]

\[RD^{'2}_{i} \leftarrow RD^2_{i} - \frac{\hat{Y_{ij}}(1 - \hat{Y_{ij}})} {1+\hat{Y_{ij}} (1-\hat{Y_{ij}})s^2} * (RD^2_i x_i)(RD^2_i x_i)^T\]

##         name=A         name=B         name=C         name=D           gate 
##   -0.001984127   -0.052900327    0.041183715    0.013700739    0.070569320 
##      factor1=a      factor1=b factor1=a:gate factor1=b:gate 
##   -0.054884454    0.054884454   -0.107784781    0.178354100

For more details read Stephen J. Roberts, William Penny (2011)

Additional controls


\(RD\) estimation is based only on previous estimate and difference between expected matchup output and real result. Sometimes scientist can have prior knowledge about particular player or event which are characterized by higher volatility. lambda proportionally changes prior RD before update, to increase or decrease uncertainty of challenge. lambda can be used differently in several situations:

  • Increasing lambda for all players flatten probabilities of winning challenge equalizing competitors chances. Rational when level of competition is different than usual e.g. derby, knock-out phase, decisive matches, exhibitions and friendlies.

  • Increasing lambda for one player makes his matchup more uncertain, and rating change to be higher than usual. Applicable when some player has not played for a longer period of time. Lambda should be empirically set by researcher as it’s not optimized by algorithms.

\[RD_i = RD_i * lambda_{i}\] ### kappa

Depending on the frequency of events, \(RD\) tends to decrease quickly to near zero, which results in freezing \(R\) in time (as \(R\) update size depends on \(RD\) size). To keep \(RD\) change below some (proportional) size we use \(\kappa\), which is expressed as follows:

\[RD_i^{'} \leftarrow RD_i - pmin(\Delta_i, RD_i (1 - \kappa))\]


Events can differ in importance, for all teams and for particular players. Weight can directly impact update size:

\[RD_i^{'} \leftarrow RD_i \pm \Delta_i * \omega_i\] \[R_i^{'} \leftarrow R_i \pm \Omega_i * \omega_i\]