# Combining multiple metrics to provide comparisons/ranking of k objects [Question and Reference Request]

Collecting $n$ metrics about $k$ objects

Suppose I collect $n$ metrics about $k$ objects. I am looking into valid ways to compare the $k$ objects so they can be “ranked”. I think this may be well-trodden ground (sports statistics like total quarterback rating etc.) but I am unfamiliar with this area.

I want to answer the question which object is best?

For each metric $m_i$, where $i$ ranges from $1 \leq i \leq n$, the score for metric $m_i$ ranges from $[0, r_i]$. Note that some of these metrics will have theoretical maximums like $100\%$ percent, other $r_i$’s will just be the maximum collected score in the sample (e.g. top speed, height etc.).

Normalising/Standardising the metric scores

My intuition is to first normalise all these scores between $[0,1]$, so that each score contributes equally to the overall score, to be calculated later.

That is, for each metric $m_i$ the score for that metric would be $\frac{m_i}{\text{max}(r_i)}$, where $\text{max}(r_i)$ is the maximum score for that metric in the sample. My intuition doesn’t allow me to be confident that this is valid, so that is my Question 1: is this normalisation procedure valid?

Also for each question the implicit question is I am probably completely wrong, what resources and topics should I be studying?

Weighting the metrics for my overall comparison

Let us further suppose that I wish to weight some metrics over others. There seems to me a few approaches, but I will outline one which I am trying to approximate.

I was thinking one possible method would be to do a pairwise comparison for each metric, and ask of each comparison: If I were to see a $10\%$ reduction in metric $m_i$, how much of an increase in metric $m_j$ would compensate for that reduction? If the pairs have no real influence over each other I could score this as a $0$ perhaps?

I would end up with a table of values for my weightings, filled with pairwise comparisons of this nature. Question 2: Would I have to be consistent when I compare $m_i$ v $m_j$ and $m_j$ v $m_i$? Or could they be non-symmetric? That is if I say a $10\%$ reduction in $m_i$ needs to be accounted for by a $20\%$ increase in $m_j$, can I say a $10\%$ reduction in $m_j$ needs to be accounted for by a $50\%$ increase in $m_i$? Would this be valid?

Perhaps I could take an average of each column and have that as my weighting for the metricy?

It would seem to me that a weighting system such as this would quantitatively say things like “for me to value object $a$ over object $b$, when $b$’s metric $m_i$ is 10% less than $a$’s $m_i$, I need to see at least a $20\%$ gain in metric $m_j$”.

Question 3: What if I were to start to include more complex considerations so that the comparisons, or compensations would be nonlinear? Or mutlivariable comparisons? Perhaps some scores should be negative etc.?

The Essential Question Really I would like to know what topics and books should I be reading about to be able to answer this type of question?

Thank you

Awesome question.

Question 1:

I approach this problem using standard deviations ($n\sigma$) to create a standardized scale where $n$ is the number of standard deviations from the mean ($\mu$) and $\sigma$ is the standard deviation.

I will use an example of a call center agent making calls. Here is a possible way to define the scale using $n$:

• $m_+ = n$: Metrics you want to maximize. Direct relationship so as $n$ increases so does the score, as $n$ decreases the score goes down. Example: number of sales.
• $m_-$ = -n: Metrics you want to minimize. Inverse relationship so as $n$ decreases the score increases. Example: Number of mistakes made in a call.
• $m_{\mu} = -\lvert n\rvert$: Metrics you want as close to the mean as possible. As $n$ gets farther from the mean in either direction the score goes down. A perfect score is 0, at the median. Example: # of voicemails / hangups / do not call requests a agent received (should be equally distributed).

Then you have a scale that is independent of units of measure, size / amplitude, etc. You can then easily normalize scale above from $[0, 1]$ where $0$ is always the worst and $1$ is always the best. So each normalized metric becomes: $\overline m_+$, $\overline m_-$, and $\overline m_{\mu}$

So the simple solution ($f_s$) becomes:
$$f_s = \sum \overline m_+ + \sum \overline m_- + \sum \overline m_{\mu}$$

Question 2

With the above solution for $f_s$ adding asymmetric weights ($W$) gives us the weighted solution $f_w$. Each one can be weighed by multiplying each of the metrics by the weight:
$$f_w = \sum\limits_{j_+} (W_{+1}* \overline m_{+1} + W_{+2}* \overline m_{+2} … W_{+j_+}* \overline m_{+j_+}) + \sum\limits_{j_-} (W_{-1}* \overline m_{-1} + W_{-1}* \overline m_{-2} … W_{-j}* \overline m_{-j_-}) + \sum\limits_{j_\mu} (W_{\mu1}* \overline m_{\mu1} + W_{\mu2}* \overline m_{\mu2} … W_{\mu j}* \overline m_{\mu j_\mu})$$

Or more succinctly:
$$f_w = \sum\limits_{j_+} W_{j_+} * \overline m_{+j_+} + \sum\limits_{j_-} W_{j_-} \overline m_{-j_-} + \sum\limits_{j_\mu} W_{j_\mu} \overline m_{\mu j_\mu}$$

Now you have a score that can take into account individual weights, minimizing metrics, maximizing metrics and metrics you want close to the mean.

Question 3

# Example

Now that you have your score $f_w$ you can again normalize it and multiply it against a a weight if you want a unit of measure. Following the call center agent example:
* agent 1: 1 sale, logged in for 30 min
* agent 2: 5 sales, logged in for 5 hours
* agent 3: 12 sales, logged in for 4 hours

Picking your metrics is very, very important.

$\mu = \frac{1 + 5 + 12}{3} = 6$

$Var(f_w) = \frac{5 + 1 + 6}{3} = 4$

$\sigma = \sqrt{Var(f_w)} = \sqrt 4 = 2$

So now we need the number of standard deviations:

• $a_1 = -2.5 \sigma$ away from the mean
• $a_2 = -0.5 \sigma$ away from the mean
• $a3 = +3 \sigma$ away from the mean

So the stack ranking of best to worst would be a3, a2, a1. The problem is that agent 2 has been payable / billable for much longer and is really the worst. So you need to be careful when crafting the metrics to make sure that they have the desired effect. In the above example it would be better to take a sales / hour approach as your metric and then multiply at the end by how long the agent has been making calls.

## Better Example: Sales / Hour

Sales Per Hour:

• $a_1 = \frac{1}{\frac{1}{2}} = 2$
• $a_2 = \frac{5}{5} = 1$
• $a_3 = \frac{12}{4} = 3$

$\mu = \frac{ 2 + 1 + 3}{3} = 2$

$Var(f_w) = \frac{0 + 1 + 1}{3} = \frac{2}{3}$

$\sigma = \sqrt \frac{2}{3} \approx 0.82$

So in this case, the agents have the following $n$ standard deviations away from $\mu$:

• $a_1 = 0 \sigma$
• $a_2 \approx -1.2 \sigma$
• $a_3 \approx 1.2 \sigma$

So now they are in the proper order but we still don’t have a gauge on how much of a problem it is that agent $a_2$ is lagging. It is more of a problem because that agent has been logged in for a longer period of time. So adding a weight $W$ of the login time gives you the following:

• $a_1 = 0$
• $a_2 \approx -1.2 * 5 \approx -6.12$
• $a_3 \approx 1.2 * 4 \approx 4.90$

You can now normalize these numbers to compare to other metrics in the same way. As you can see, $a_1$ is doing well, $a_3$ is doing the best and $a_1$ in the scenario is lagging way behind. These are the results you would expect. This number now portrays:

1. quality (of the agent)