Explain Artificial Intelligence for Credit Risk Management - Deloitte

Explain Artificial Intelligence
for Credit Risk Management
Post 2008 crisis, Paul Willmott and            risk measurement frameworks, depending          At the same time, so-called machine-
Emmanuel Derman were already pointing          on portfolios characteristics for regulatory,   learning method development is
out one of the major challenges of financial   financial or business decision-making           skyrocketting. The progresses in
institutions in the Modeler’s Hippocratic      purposes. In addition, start-ups and            computers’ processing allows the use
Oath: “Nor will I give the people who          fintechs are developing AI components           of methods such as Deep Learning,
use my model false comfort about its           very quickly, powered by digital growth and     tree-based algorithms (Decision Trees,
accuracy. Instead, I will make explicit        the increasing amount of available data.        Random Forest and Gradient-Boosting
its assumptions and oversights.” In                                                            Machines such as the most recent and
addition to this needed transparency, this     To align with those new agents, banks           powerful XGBoost and Light GBM), and
emphasizes the ethical point related to        must develop more reliable models in            ensembling techniques that combine the
model use and understanding.                   order to reduce the decision time and           ouputs of machine-learning models (such
With the development of computational          develop better business.                        as Stacking).
methods, understanding the output                                                              In the credit risk industry, the usage of
provided by complex models is                  Quantitative modeling techniques are used       Machine Learning techniques for model
becoming stronger than ever.                   to get more insights from data, reduce cost     development faces skepticism, notably for
                                               and increase overall profitability. Every       regulatory purposes because of the lack of
After the financial crisis, regulators have    model contains inherent deficiencies and it     transparency and the known “black box”
put a great focus on risk management           is important to keep the focus on reducing      effect of these techniques.
supervision and expect financial               model errors.
institutions to have transparent, auditable
Artificial Intelligence for Credit Risk Management

Although Artificial Intelligence can help
model developers to reduce model risk and
improve general model predictive power, a
wide part of the financial industry remains
careful regarding the explainability barrier
faced by machine learning techniques.
Indeed, the progress observed in the
accuracy of models, are often made at
the cost of their explainability. Moreover,
this lack of explanation constitutes both
a practical and an ethical issue for credit
professionals, as said by Guidotti and al.
As pointed out by the latest reports
produced by the World Economic Forum
and the French prudential authority (ACPR),
Artificial Intelligence as a topic has reached
on inflection point. In the short term, it
seems important that the development
of Artificial Intelligence in the banking and
insurance sectors satisfy minimum criteria
for governance and control. This reflection                The toolkit opening the black-box                  The Zen Risk toolbox aims to explain both
should cover the proof of the reliability of               Deloitte has designed the Zen Risk                 an isolated observation and the overall
the algorithms used (with a view to their                  platform, which enable its users access            decisions taken by the algorithms. These
internal auditability as well as external),                to the most advanced and modern tools              techniques are instrumental in the era
models’ explainability and interactions                    at each modelling and validation step. Its         of increasingly precise models, to the
between humans and intelligent                             objective is to provide pre-approval with          detriment of their interpretability.
algorithms.                                                tools enabling them to automate a part of
                                                           their work, to compare their model with            The approach flows smoothly and
Artificial Intelligence is already transforming            different approaches and, finally, to give         gradually, through many dimensions
the financial ecosystem, offering a wide                   them the keys to integrate transparent             (model explanation, important features,
range of opportunities and challenges,                     rules identified by Artificial Intelligence into   outcome global explanation, individual
across different sectors (deposit                          existing models.                                   forecast explanation).
and lending, insurance, investment
management, etc.), therefore the definition                Originally designed for challenging internal       Machine learning modelling
of AI model governance is becoming                         models, this approach could be adapted             The process starts with an advanced data
a key concern. As a consequence,                           to underwriting and preapproval credit             pre-processing step, where a wide range of
understanding and explaining the output                    processes, building eligibility scores or          machine learning tools are used to improve
of machine learning is becoming a top                      underwriting scorecards. This black-box            data quality. The process includes data
priority for banks and regulators.                         opening approach can also be extended              imputation, when relevant, data filtering
                                                           to collections and recoveries, loss                for very sparse data, and detection and
                                                           management, including litigation recoveries        management of outliers. The choice of the
                                                           models, recovery forecasting models or             techniques used is left to the modeller,
                                                           restructuring and discount models.                 with the possibility to visualize the real time
                                                                                                              effect of the methods used on the dataset.

Artificial Intelligence for Credit Risk Management

The solution then explores many models, from the well-known logistic regression to the most recent Boosting (Light GBM, XGBoost) and
Neural Networks.

The table below sums-up some of the most widely used methods:

Algorithms considered:

                                 Heterogeneous classifiers:
                                                                                                                                       Homogeneous classifiers:

                                 Combines predictions from
                                                                                                                            Combines predictions from multiple similar
                                  various types of models.
                                                                                                                               models. Diversity through sampling.

                                                                                                                                           Random Forest (RF)
                                           Hill Climbing
                                         algorithm (HCES)                              Individuals classifier:
                                                                                   The eventual scorecard consists
                                                                                             of a single
                                                                                        classification model.

                                                                                      Logistic Regression (LR)
                                                                                   Support Vector Machines (SVM)
                                                                                       Neural Network (NN)
                                                                                       Decision Tree (CART)

For these complex models that need a                               Generally, genetic algorithms consist in                                    But if a machine-learning model performs
high degree of parameter tuning, the                               keeping the most resilient parents models.                                  well, why do not we just trust the model and
use of hyperparameter optimization                                 Crossing over parents and allowing genetic                                  ignore why it made a certain decision? 2
algorithms such as genetic                                         mutation creates further generations of                                      Are we ready to believe the outputs with
algorithms are necessary to reach the                              models. The process is then iterated until                                  high degree of confidence, without taking
best performance.                                                  reaching satisfactory results.                                              into account ethical consideration? Of
                                                                   At the cost of its complex implementation,                                  course, the answer is no, and it becomes
Indeed, the number of hyperparameters                              using the right parameters leads to an                                      necessary to open the black-box!
is too important to test all set of possible                       increase of performance metrics (ROC
combinations. Borrowed from biostatistics,                         AUC, F1 Scores, …) in a reasonable amount                                   The development of methods for opening
genetic algorithms are inspired by the                             of time. Indeed, in a case for a PD model                                   the black box has increased considerably in
concept of natural selection and help to                           development, optimizing hyperparameters                                     recent years.
fight back this issue.                                             could lead to a 10% increase in AUC
                                                                   compared to default parameters.

                                                          Random Forests
                                                             Features                                         ALE Plot
                                                          Contribution for a                                                                               SHAP
                                                          Given Prediction
                                        ICE Plot                                            LIME                                    Grad-CAM

                                                                                                              Dec 2016

                                                   July 2014                                       Apr 2015                                    Apr 2017
                                                                                                                         Feb 2017                                    Dec 2018
                                        Mar 2014
                                                               Aug 2015               Feb 2016                                      Mar 2017              May 2017
                            Apr 2001                                      Oct 2015

                                              Random Forests
                                                                                                     LRP                                                             MACEM
                                                 Features                                                                                  DeepLIFT
                                                                       Decision Threshold                                RETAIN
                              PDP                                         Distribution
                           (Freidman)                                       (Airbnb)

    Christoph Molnar, A guide for Making Black Box Models Explainable

Artificial Intelligence for Credit Risk Management

Model explainability                                  For example, in the graph below, we
The first model explainability tool is often          can observe that the credit amount, the
provided by the model itself. Indeed,                 borrower’s age or the maturity (duration)
and this is especially true for tree-based            are the most important variables in the
methods, the algorithms can assess the                dataset. On the contrary, some variables
importance of each variables, giving a                have a limited impact and bring almost
hierarchy of features importance. To do               no information.
so, it computes the impact of changing a
variable in a tree (by another random one)            This method could be used for variable
on the model evaluation metric. The more              selection, before the application of a
the model quality decreases in average, the           most usual kind of model such as a
more the variable is important.                       logistic regression.

                                                     LightGBM Features (importance)

                            Credit amount

                         Savings_quite rich
              Purpose_domestic appliances

                                              0           200            400             600       800   1000   1200


Model-Agnostic methods                                •• Local Models (LIME) focus on using
In the recent literature, the research about             interpretable models to explain locally
models explainability has increased. Some                the model’s decisions.
approaches are remarkable:
                                                      •• Shapley value, the most sophisticated
•• Partial Dependencies Plot (PDP) aims to               available approach.
   introduce variations in input variables
   and plot the output of the model along
   these variations. It is one of the most
   used techniques, and it gives users a
   good review of the response of the model
   to a feature globally.

Artificial Intelligence for Credit Risk Management

Model sensitivity to variables, the                                                       40
Partial Dependecy Plot example
Partial Depency Plots highlight the                                                       35

                                                     Partial dependency (In Percentage)
sensitivity of a model output to variation
of a feature. It exhibits the sense of the                                                30

relationship (positive or negative effect)
and quantify the impact of a variable                                                     25

through a response function.



                                                                                               10   20   30   40         50   60   70

In the example above, the blue line
summarizes the average effect of the
variable Age on the Default rate. We can
see that the default rate is more important
for young adults than retired people (that
have a fix revenue).
It underlines the non-linear effect of the
Age variable and can be used to practice
either segmentation, variable selection (if
the average effect is linear, the variable is
thus useless) or optimization of the binning
of numerical variables.
Likewise, it is possible to combine
the effect of two variables by using
3Dimensionnal PDP.

Artificial Intelligence for Credit Risk Management

Local MODELS: LIME                                                                                                                                                              models. It gives a good approximation of
PDP give a global and a local view in term                                                                                                                                      the machine learning output, locally.
of features. However, it is difficult to plot
effect on more than two dimensions                                                                                                                                              To do so, LIME generates depending on a
(crossed effect of two variables) or                                                                                                                                            kernel a new dataset containing permuted
to explain the relationship between                                                                                                                                             samples and the predictions of the black-
variables at a global and local (for a given                                                                                                                                    box non-interpretable model. Then, the
observation) level.                                                                                                                                                             principle is to fit an interpretable model
                                                                                                                                                                                (linear regression, decision tree, …) on
LIME belongs to local models’ family, which                                                                                                                                     machine learning outputs, to explain why a
is a set of models used to explain individual                                                                                                                                   chosen borrower is classified as default or
predictions of black box machine learning                                                                                                                                       not for example.

 Prediction probabilities
                                                                                                                                                                                                                                               Feature        Value
Artificial Intelligence for Credit Risk Management

The Shapley value analysis                           counterparty, the Shapley value could help   Answering these questions is easy with
In most recent research works, the Shapley           to understand:                               linear models, however, it is much harder
value approach inspired by the game                                                               with complex algorithms.
                                                     •• How much has each feature contributed
theory essay of Lloyd Shapley seems to be
                                                        to the average prediction?
the most promising.                                                                               The shapley values aswers this specific
Assuming a Probability of Default model              •• How much has each feature contributed     issue. It aims to find each variable marginal
as previously, where the objective is to                to an individual targeted prediction?     contribution, averaged over every
estimate the probability of default of a                                                          possible sequence in which the variable
                                                                                                  could have been added to the set of
                                                                                                  explanatory variables.

    Where S is a vector with a subset of             Finally, it gives both a global picture
    features, F the full number of features,         and a downscale effect on a specific
    f() the output of a model and i the added        individual. Solutions to the feature
    feature. Contrary to LIME, the shapley           importance attribution problem are
    value is unique.                                 proven unique thanks to their properties
                                                     of local accuracy, missingness, and

Artificial Intelligence for Credit Risk Management

In the example above, an individual                  people making loans for their education are
borrower is studied with a Shapley value             more willing to repay than others. Besides,
analysis. His profile seems to be risky              it quantifies the risk with the Shapley value
(unskilled and non-resident, he wants to             impact on model output.
buy a car, but does not have a lot of money
in his bank account).                                However, the main drawback of this
                                                     method is its computational time,
Besides, in a global manner, the graphs              proportionate to the number of features,
below exhibit the fact that younger people           observations, and the complexity of the
and small duration loans are riskier, while          model.

The Hybrid approach                                  Therefore, by using simple logistic
Deloitte hybrid’s approach is a two stages           regression and additional rules, we succeed
analysis crossing over the outputs of                in reaching an accuracy level way better
machine learning models and usual logistic           than the traditional model and picturing
regressions. Indeed, it consists in looking          the reality in a more comprehensive way,
at population that have been misclassified           with the possibility for the modeller to
by logistic regression but well classified by        adjust different choices of business rules.
an advanced model (e.g Random Forest,                Another value of this approach is that
Neural Network, …). Once the population              expert business can validate the new rules
identified, the extraction of business rules         extracted from advanced algorithms and
takes place in order to override logistic            decide whether it makes sense or not.
regression model outputs.

Artificial Intelligence for Credit Risk Management

Conclusion                                               models’ sensitivity to variables could be
Looking at algorithm explainability and                  useful for risk management and stress
transparence may also be an enabler to                   testing purposes.
quantify model risk. Indeed, addressing
problematics such as inputs and                          Moreover, recent progresses on the
methodology, financial institutions will be              academic field and discussions around
able to better quantify model risk arising               the governance of Artificial Intelligence,
from these types of techniques.                          emphasize the premises for future changes
By analogy, this understanding of                        in the model arena in the years to come.

About the authors

Hervé PHAURE                                                                      Erwan ROBIN
Partner, Risk Advisory                                                            Senior Consultant, Risk Advisory
hphaure@deloitte.fr                                                               erobin@deloitte.fr

Hervé is Partner in the Risk Advisory department,                                 Erwan Robin is a senior consultant
in charge Credit Risk Advisory services. Hervé has                                and works as a data scientist within
been involved in risk management areas since                                      Deloitte France. His work consists in
more than 25 years. His expertise relates with                                    applying innovative solutions to credit
statistical models in finance, Risk Management,                                   risk management, notably for credit risk
IFRS9, credit processes, valuation of credit                                      modelling. He is involved in research
portfolios. He coordinates Deloitte Credit Risk                                   and development topics regarding
Community at European level.                                                      machine learning.

