Benchmarking your counterparty credit risk ratings

Benchmarking your counterparty credit risk ratings

The use of counterparty credit ratings is standard practice for evaluating the creditworthiness of counterparties.

Benchmark assessments of internally modelled ratings can be valuable, as an additional aid to remove biases from risk assessments and to refine in-house risk models. Assigning a risk rating to a counterparty allows one to translate the credit risk of a counterparty into credit risk metrics, such as the probability of default (PD) and loss given default (LGD). Such metrics are key factors in determining the entire portfolio credit risk.

Many financial institutions (FIs) and larger corporates have either in-house developed risk rating models or vendor models in place. As these models play a key role in credit decision making and capital adequacy assessments, it is important to periodically assess the performance of these models and adjust them if necessary – no one wants to consistently underpredict or overpredict the credit risk stemming from a loan portfolio.

A challenge that FIs experience on low-risk portfolios as well as smaller, specialized portfolios is that they hardly observe defaults in any given year. A traditional back-test against realized defaults is then of limited value. In this situation, model benchmarking is a very interesting alternative, especially if the model you are benchmarking against is based on a larger dataset.

What is model benchmarking?

In a benchmarking exercise, all counterparties are rated by another rating model. The outcomes are then compared to the results of the model you currently use. In its simplest form, only the final ratings are compared against each other. This already provides important insights in the overall performance of the models as it identifies strong model biases.

A more advanced analysis, however, also looks at the structure and the underlying risk factors, including the assigned risk scores for all individual risk drivers. Such a detailed analysis uncovers model aspects where it might deviate from general industry practice and highlights implicit assumptions made in the model. Based on these findings, conclusions can be drawn to recalibrate or remain using the model as it is. Refinements can be made, or efficacy confirmed.

Performance measures

Credit benchmarking involves a variety of valuable metrics to measure the extent to which the benchmark model agrees with the model in use. In the end, the performance measures give objective insight into two key performance measurement questions:

  1. Do the two models agree on the rank ordering of the counterparties?
  2. Do the two models agree on the overall riskiness of the portfolio?

Ranking the risk ratings

Credit ratings express the credit worthiness of counterparties in a relative rank order. This ranking is an ordinal measure of the credit risk and is in itself not a reflection of the probability of default.

Rating models often assign a rating based on a score that is determined based on a combination of features of the counterparty itself, and of its environment. A proper rating model should be able to identify differences in the relative creditworthiness based on these features, which should be expressed by the model score and resulting credit rating.

Different performance measures exist to determine the performance of the model in terms of the ranking. On the one hand, straightforward measures like average number of ‘rating steps’ difference, or the percentage that has been rated respectively worse or better by the benchmark model.


Two additional important measures are Goodman and Kruskal’s gamma (gamma correlation) and Kendall’s tau. Both measures reflect a rank correlation metric that computes the correlation between two ordinal variables.

These measure the strength of the variables, where a value-output of 1 would be a perfect correlation, and 0 would be no correlation at all, and -1 would mean a negative correlation.

The computation of the rank correlations depend mainly on the concordant pairs and disconcordant pairs. A concordant pair is a pair of observations, each on two variables (𝑋1,𝑌1) and (𝑋2,𝑌2) having the property that sign(X2-X1) = sign(Y2-Y1).The sign function will return -1 if the number is negative, 1 if it is positive, and 0 if the number is zero. A disconcordant pair is a pair where the formula above does not hold. A pair is tied if 𝑋𝑖=𝑋𝑗 or 𝑌𝑖=𝑌𝑗.

The formulas for both measures are as follows:

credit risk formula

with nc, nd and n respectively the number of concordant, disconcordant and total number of observations. The gamma correlation does not consider tied pairs, while the Kendall’s Tau counts them negatively. This results in a statistic that is more ‘conservative’ (typically lower in value) than gamma.

benchmark rating example 1

Example 1: good rank correlation: both models identify similar companies as weak and strong.

benchmark rating example 2

Example 2: poor rank correlation: there is no agreement between the models on which counterparties are weak and strong.

The convenience of these rank correlation measures is that they can effectively compare risk ratings even if the two models use different rating scales.

Calibration quality

Besides providing insight into the relative creditworthiness of counterparties, a proper rating model should be able to determine an accurate overall riskiness of the portfolio.

Risk management processes often apply ratings by mapping them directly to a so-called through-the-cycle (TTC) PD. These PDs are set such that they match the PD that is historically observed or expected for a particular rating over a longer time period.

Mapping each rating to a PD allows users to calculate the portfolio notional weighted average PD.

For a proper rating model, the portfolio average PD should be close to the long-term average observed or expected portfolio PD (the “central tendency”). If the number of observations is high enough, this assessment can be done separately for every rating class.

During a benchmark exercise, the difference between the resulting average PD from the two models can be investigated. Any large differences might suggest that the model is biased, leading to under- or overestimation of the entire portfolio risk.

benchmark rating example 3



Benchmarking can provide great insight in the performance of your ratings, by using relatively simple techniques. The only challenge is finding a proper benchmark model. Luckily, Zanders has multiple proprietary credit rating and LGD models available that allow you to start benchmarking right away.