Module 4: Model Interpretation

Module Overview

In this final module of the sprint, you'll learn techniques for interpreting machine learning models and explaining their predictions. Model interpretability is crucial for building stakeholder trust, ensuring ethical decision-making, debugging models, and gaining insights into your data that you can communicate effectively.

Learning Objectives

Objective 01 - Visualize and interpret partial dependence plots

Overview

In this Sprint, we have been focusing on the ranking features by their importance to the model. We've looked at lists of features and their contribution to the model. Sometimes, we want more information about how the model is working and how a specific feature(s) affects the model predictions. One of the main ways to take a look inside a "black box" model is to utilize partial dependence plots.

A partial dependence plot can be used to show how a model prediction partially depends on values of the input variables of interest (the input feature). For example, in the data we'll explore below, we can fit a model to predict wine quality. Once we have the model fit, we can isolate a particular feature and visualize how the model prediction depends on that feature over its range of values. The plots below will show how we visualize the model prediction's dependence on alcohol content.

Follow Along

We're back to the wine quality data set. In this example, we're going to change our target to having only two classes (bad/good): 0 for quality less than or equal to 5 and 1 for above five. Rethinking this problem as a binary classification will make it easier to visualize our partial dependence plots because the model itself will be simpler.

After loading in the data, we'll fit a decision tree classifier model. We're not going to split the data into training and testing sets because we're not looking at the accuracy or other evaluation metrics right now. We'll use all of the data available to train the model and create the partial dependence plots.

# Load the dataset
import pandas as pd
wine = pd.read_csv('winequality-red.csv')
wine.head()
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5
1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 5
2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8 5
3 11.2 0.28 0.56 1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8 6
4 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5
# Set the features list and target variable
target = 'quality'
X = wine.drop(target, axis=1)

# Create the target array
y = wine['quality']

# Map the target to a binary class at quality = 5
y = y.apply(lambda x: 0 if x <= 5 else 1) 

# Instantiate the classifier from sklearn.tree import
DecisionTreeClassifier 
clf=DecisionTreeClassifier(random_state=42)

wine_model=clf.fit(X, y)

Calculating the Partial Dependence

We always calculate our partial dependence values after we have fit a model. Using the PartialDependenceDisplay() module, we specify the features that we would like to visualize. In this case, the alcohol and chlorides are the two features we're using. Usually, you would have analyzed the feature importances and then selected the most important ones to use. In this example, we're using a slightly different red wine data set than what is available from the scikit-learn datasets module, but we know that alcohol is one of the top features in terms of importance.

from sklearn.inspection import PartialDependenceDisplay
import matplotlib.pyplot as plt

fig, ax = plt.subplots(1,1, figsize=(12,4))
PartialDependenceDisplay.from_estimator(wine_model, feature_names=X.columns,
    features=['alcohol','chlorides'],
    X=X, grid_resolution=50, ax=ax);

ax.set_title('Wine: Partial Dependence')

plt.show()
<Figure size 864x288 with 0 Axes>

mod4_obj1_pdp_separate.png

On the left plot, we can see that as the percentage of alcohol increases the model more strongly predicts higher quality. On the right, the amount of chlorides in the wine doesn't seem to affect the prediction as much as the alcohol content feature.

Now, we're going to use a different library to visualize the partial dependence for one feature. This plot is calculating the same dependence of the feature but is plotting all of the predictions (shaded area) as well as the average (solid line).

!pip install PDPbox

from pdpbox.pdp import PDPIsolate, PDPInteract
from pdpbox.info_plots import TargetPlot, InteractTargetPlot

isolated = PDPIsolate(
    model= wine_model,
    df=X,
    model_features=X.columns,
    feature='alcohol',
    feature_name='alcohol'
)
fig, axes = isolated.plot()
fig

mod4_obj1_pdpbox.png

This plot is on a different scale that the first one but we can still see the same increase after 11% alcohol content.

Challenge

From the above example, select a different feature (or features) and re-create the plots above. Write out a few sentences to describe the dependence of the model prediction on that feature. Is the model affected by changes in the feature's values?

Additional Resources

Objective 02 - Explain individual predictions with shapley value plots

Overview

In this objective, we'll dive a little deeper into how individual features contribute to making a prediction. A quantity called a Shapley value, borrowed from the concept of game theory, will be used to explain the difference between the prediction of an individual observation and the average prediction from all the observations (instances).

One way to consider this problem is to think of all features in the model, that are working together to make a prediction. For example, using the wine data set from the previous objective, we can consider the alcohol content, the sulphates, and the pH. For one observation (row of the DataFrame) they work together to make a prediction. The rest of the rows all work together to make the average prediction. The Shapley values look at each feature's contribution to the single row prediction and explain exactly how it is different from the average prediction.

Let's work through an example using the SHAP library which stands for SHapley Additive exPlanations.

Follow Along

# Load the dataset
import pandas as pd
wine = pd.read_csv('winequality-red.csv')
wine.head()

# Set the features list and target variable 
target = 'quality'
X = wine.drop(target, axis=1)

# Create the target array
y = wine['quality']

# Map the target to a binary class at quality = 5
y = y.apply(lambda x: 0 if x <= 5 else 1)

To calculate the Shapley values, we'll be fitting a xgboost classifier model to the data. The TreeExplainer() is used to explain the output of ensemble tree models. From the explainer values, we calculate the Shapley values and then visualize the results for a single observation (instance).

# import shap 
import shap 

# Instantiate and fit the model 
from xgboost import XGBClassifier 
xgb = XGBClassifier(random_state=42) 
wine_model = xgb.fit(X, y) 

# Shap explainer initilization 
shap_ex = shap.TreeExplainer(wine_model) 

# Determine Shap values 
shap_values = shap_ex.shap_values(X)
import matplotlib.pyplot as plt

# Calculate the shapley values
shap_values = shap_ex.shap_values(X)
  
# Initialize the plot 
shap.initjs() 

shap.force_plot(shap_ex.expected_value, shap_values[25,:], X.iloc[25,:])

mod4_obj2_SHAP.png

We can interpret the above plot in the following way: For the sample in 25th row, higher volatile acidity helped improve the wine quality, whereas a higher alcohol content and residual sugar (and others) resulted in lowering the quality of wine.

Challenge

We plotted the feature contributions for a single observation. Using the same code as above, try selecting a different observation or row of the data frame. You might consider changing the value of 25 in shap_values[25,:] and X.iloc[25,:]. Write out a few sentences to describe what the SHAP plot indicates about the feature contributions for this observation.

Additional Resources

Guided Project

Open DS_234_guided_project_notes.ipynb in the GitHub repository below to follow along with the guided project:

Guided Project Video - Part One

Guided Project Video - Part Two

Module Assignment

For this final assignment, you'll apply model interpretation techniques to your portfolio project to gain insights and effectively communicate your model's behavior.

Note: There is no video for this assignment as you will be working with your own dataset and defining your own machine learning problem.

Additional Resources

Data Visualization and Communication