-This page is part of the Responsible AI series.

Index

  1. Introduction
  2. Interpretability, Explainability, Transparency
  3. Applications
  4. Further readings

Introduction

Interpretability, Explainability, Transparency

Definition of Transparency

Transparency can be a vague term that can be difficult to measure. How might one determine if an algorithmic system is transparent enough? Would dumping large files of data, source code, and documentation onto the public make a system transparent? Maybe, but this type of transparency is not meaningful to the general public as the documentation may not be directly relevant or comprehensible. Data dumps can also lead to information overload as one might not know where to start to understand how they are impacted by the system. We should argue instead for the promotion of meaningful transparency. Meaningful transparency is driven by stakeholder information needs. It means delivering the information pertinent to each stakeholder group in the format that is best suited for their understanding. From a societal viewpoint, there are three levels of meaningful algorithmic transparency:

Sources

Definition of Interpretability

There is no mathematical definition of interpretability. A (non-mathematical) definition written by Miller (2017) is: "Interpretability is the degree to which a human can understand the cause of a decision". Another one is: "Interpretability is the degree to which a human can consistently predict the model’s result". The higher the interpretability of a machine learning model, the easier it is for someone to comprehend why certain decisions or predictions have been made. A model is better interpretable than another model if its decisions are easier for a human to comprehend than decisions from the other model.

Why interpretability is important

When we don't need interpretability

How interpretability is achieved

Interpretability vs Explainability

A model is interpretable if it is capable of being understood by humans on its own. One can look at the model parameters or a model summary and understand exactly how a prediction was made. Another term for these types of models is an intrinsically interpretable model. An example of interpretable model is a decision tree: to understand how a prediction was made, one simply has to traverse down the nodes of the tree. Intepretable models are also referred to as White Box. The explainable models on the other hand are black-box, functions with inputs and outputs too complicated for humans to understand, that require additional method/technique to be able to understand how the model works. A black box model is a system that does not reveal its internal mechanisms. In machine learning, “black box” describes models that cannot be understood by looking at their parameters (e.g. a neural network). In general, this classification based on human comprehension has no formal way to easily discriminate, and the goal of interpretation can be achieved independently from this classification, based only on the type of model and the questions we seek to answer. Please note that Model-agnostic methods for interpretability treat machine learning models as black boxes, even if they are not.

Transparency as interpretability

Lipton et al. define "transparency as interpretability" as the model's properties that are useful to understand and can be known before the training begins. In this case, model are instrinsicly interpretable.

Post-hoc interpretability

Lipton et al. define "post-hoc interpretability" as the things we can learn from the model after training has finished.

Deliverables of interpretation methods

Other differences between interpretability methods

Sources

Papers

Applications

Interpretability solutions

Interpretability solutions are inserted in the specific pages of the algorithms. Here's the list:

Random Forest

Post-hoc Explainability solutions

Others

Mixed solutions

InterpretML

Transparency for LLMs

Further readings

TO BE REVIEWED

In the context of machine learning and artificial intelligence, explainability and interpretability are often used interchangeably but they have some differences, even if they are very closely related:

Interpretability is about the extent to which a cause and effect can be observed within a system, i.e. the extent to which you are able to predict what is going to happen, given a change in input or algorithmic parameters. It’s being able to look at an algorithm and go yep, I can see what’s happening here.

Explainability, meanwhile, is the extent to which the internal mechanics of a machine or deep learning system can be explained in human terms. It’s easy to miss the subtle difference with interpretability, but consider it like this: interpretability is about being able to discern the mechanics without necessarily knowing why. Explainability is being able to quite literally explain what is happening.

Sources

Explainability frameworks for each algorithm

Explainability for logistic regression

Explainability for Random Forest

Built-in function in

Sources

Explainability for XGBoost

XGBoost has built-in explainability methods, based basically on weight (i.e. the frequency of a feature in splitting the trees), cover (i.e. the frequency of a feature in splitting the data across all trees weighted by the number of training data points that go through those splits) or gain (i.e. the average training loss reduction gained when using a feature for splitting).

Unfortunately, these methods are not always consistent within each other, making difficult to compare feature importance from different models. Therefore, it is advisable for XGBoost to use SHAP value, since it doesn't present these inconsistency issues.

Sources

Explainability for imaging recognition

Explainability for Graph Neural Networks

Sources

Explainability frameworks

SHAP (SHapley Additive exPlanations)

SHAP values are based on Shapley values, a concept coming from game theory: what Shapley does is quantifying the contribution that each player brings to the game, what SHAP does is quantifying the contribution that each feature brings to the prediction made by the model. A “game” concerns a single observation. Indeed, SHAP is about local interpretability of a predictive model. SHAP values are based on the idea that the outcome of each possible combination of feature (= coalition of players) should be considered to determine the importance of a single feature (= player).

Picture 1. Power set with three features.

In math, this is called a “power set” and can be represented as a tree. Each node represents a coalition of features. Each edge represents the inclusion of a feature not present in the previous coalition.

The cardinality of a power set is 2^n, where n is the number of elements of the original set. Now, SHAP requires to train a distinct predictive model for each distinct coalition in the power set, meaning 2 ^ F models. These models are completely equivalent to each other for what concerns their hyperparameters and their training data, the only thing that changes is the set of features included in the model.

As seen above, two nodes connected by an edge differ for just one feature, in the sense that the bottom one has exactly the same features of the upper one plus an additional feature that the upper one did not have. Therefore, the gap between the predictions of two connected nodes can be imputed to the effect of that additional feature. This is called “marginal contribution” of a feature.

Therefore, each edge represents the marginal contribution brought by a feature to a model.

Sources


LIME (Local Interpretable Model-Agnostic Explanations)

LIME is a perturbation method, i.e. it introduces small perturbations in the input and see how they are reflected in the predictions. The perturbation can make sense to humans (e.g., words or parts of an image), even if the model is using more complicated components, making easy the interpretability.

Sources

Tools