The Elo rating system is used to calculate relative skill levels between individuals or teams. It can be applied to many types of games and sports, but when applied, it only considers wins and losses.

Is there a variation of this method that incorporates the

scoreby which a game was won?That is, a method that takes into account if a game was by a score of

`10-1`

or a much closer game of`10-9`

.

**Answer**

The way to generalize Elo is to consider it in the broader context of paired comparison models. These models were first developed in psychology to model rankings and preferences of participants over decisions and options. Classic Elo can be seen as a discretized dynamic approximation of the Bradley-Terry Model, which is amoung the earliest and most well known models in the paired comparison literature, and can be formulated as follows:

\begin{align}

P(i > j) = \frac{1}{1 + exp(\beta_j – \beta_i)}

\end{align}

I.e we model the probability of player i beating j (or general object i ranking higher than j) as being a function of the difference in relative strengths of the players, i and j ($\beta_i – \beta_j$). In the frequentist context, we can’t identify the values of $\beta$, but we can estimate their values relative to one another (In the frequentist case, usually a constraint is placed on the $\beta$s, i.e they sum to 1, is used to get identifiability if desired. In the Bayesian context, a prior on the $\beta$s achieves identifiability in the Bayesian sense).

So essentially, the Bradley Terry model is just a logistic regression (technically a dynamic approximation to one, for more precision about the relationship see https://www.stat.berkeley.edu/~aldous/Papers/me150.pdf), where the regressors are 1,-1 indicators for which players are playing in a given context.

The reason we see that classic elo is restricted to 0,1 outcomes is because of the likelihood, which models binary outcomes. We see then that the solution to accommodating different outcomes is by modifying the likelihood. There is a huge literature adapting paired comparison models to nearly every relevant case seen in sports imaginable. At the end of this post I will put a few examples of papers.

In american football this was done by modelling the difference in scores as a normal. Say that $S_A, S_B$ are the scores of team A and B respectively (typically A is meant to denote the home team and a home advantage is modelled).

$$S_A – S_B \sim N(f(\beta_A – \beta_B), \sigma)$$

We model score differences again as some function of the difference in strengths. The variance $(\sigma)$ can be modelled in a number of different ways including as a function of score variances for each team and/or something that varies over time. See https://www.researchgate.net/publication/2244176_A_State-Space_Model_for_National_Football_League_Scores for this example. This is one of the most influential papers in the literature. Notice that in a previous paper, they mention that they spent time validating the assumption that observed score differences in American football are approximately normal. In lower scoring sports this assumption is unlikely to hold. In a really low scoring sport, say European football, one could model the outcomes as an ordered logit or probit for example. Different sports will require different likelihoods and approaches and validation to capture the underlying winning process.

An extremely clever and good paper, attacked the problem of finding a unifying approach to different sports, by modelling betting log odds directly as a normal distribution(https://arxiv.org/pdf/1701.05976.pdf). The idea is that while the game scores in different sports have completely different properties, betting markets are the same across many sports. Betting lines imply market probabilities and log-odds are approximately normal quantities. This is a good way of extracting more information that simple binary win-loss if you can get your hands on betting information.

So in short, the solution is to consider Elo in the context of paired comparison models. This framework is richer and more flexible, allowing for different likelihood specification and additionally can easily accomodate ratings which vary over time (in the bayesian context at least). It is also easier to accommodate covariates in the framework. Most elo-type models are able to accomodate home advantage, but rarely more covariates. The only advantage of Elo-type themselves models is that they are easy to calculate dynamically, which is a really useful property if the goal is to create rankings for say online chess or videogames. Only some paired comparison models have been turned into elo-like models, but this is a growing literature. Microsoft trueskill is an example for one. If the goal is for say betting or for a small non-dynamic data set, this shouldn’t be much of a drawback. There are many existing packages for a lot of paired comparison models as well.

Additional papers

Modelling Sports with rankings (I.e track and field sports for example): http://www.glicko.net/research/multicompetitor.pdf

Stochastic and Dynamic Model: http://www.glicko.net/research/dpcmsv.pdf

**Attribution***Source : Link , Question Author : Figaro , Answer Author : Tyrel Stokes*