
Introduction
Hello everyone, I am Yushiro Yamashita, a data scientist at HACARUS. Today I would like to discuss methods for visualizing AI’s decision-making criteria using several different models.
When it comes to machine learning, it is easy to develop a predictive model as long as you have sufficient data. However, when using more complex models, such as for ensembles or neural networks, we run into the issue of model interpretability. So even though these models improve performance, it is difficult to understand the logic behind their predictions.
There are several solutions that have been developed to address this problem, but today I would like to introduce two of them. These two methods are called the Partial Dependence Plot (PDP) and the Individual Conditional Expectation plot (ICE plot). These methods have also become more user-friendly after the recent 0.24.1 update to Scikit-learn. Below, I would like to explain more about these models using real-world applications.
PDPs & ICE Plots
To put it simply, PDP and ICE plots are diagrams that show how the prediction of machine learning models changes when a certain variable (explanatory variable) changes. This visual representation is useful for interpreting how black-box models behave. For a more detailed explanation, please refer to chapter 5 of “Interpretable Machine Learning,” a translation project using AI.
Going into more detail, the basic idea of an ICE plot is to study how a model behaves by predicting fictitious data for a trained model by verifying only the values of a specific feature for each instance. All of these instances are then compiled and averaged to give the PDP. However, we also need to be careful when interpreting these plots because if there is a strong correlation between the features, the fictional data produced when creating the PDP and ICE plots will no longer be realistic.
To demonstrate this point, let’s take a look at an example where PDP and ICE plots are created from breast cancer data. In the plot below, the horizontal axis is the feature of interest, which in this case is the radius of the tumor cells. The vertical axis is the predicted result for the probability that the tumor is benign.
Looking closely at the figure, each of the blue lines represents one instance for the ICE plot. The thick black line in the middle is the average of these instances which forms the PDP. In this specific case, we can notice that the predicted values decrease as the feature value increases.
The black line also represents the tenth quartile of the histogram, which shows the distribution of the actual input data for that feature. However, there is not enough data included so this plot should be considered unreliable.
Creating PDPs & ICE Plots Using Scikit-learn
In the Scikit-learn program, there is a module for PDP. After the 0.24.1 update last December, it is also possible to draw ICE plots using this module. In the remainder of this article, I will demonstrate how to use this module to draw a PDP and ICE plot using real data.
The first thing we need to do is properly prepare the program by importing the required libraries. The Scikit-learn program should also be updated to the 0.24.1 version or higher (if using Google Colab, run ‘!pip install -upgrade scikit-learn’).
#モジュールのインポート
import matplotlib.pyplot as plt #表示用
import pandas as pd #データ操作用
from sklearn.ensemble import RandomForestClassifier #機械学習モデル(分類器)
from sklearn.inspection import plot_partial_dependence #PDPのモジュール
from sklearn.datasets import load_breast_cancer #データセット
Next, we need to choose the machine learning model and dataset that will be used. For this example, we used the ‘RandomForest’ and ‘breast cancer’ datasets. We should also briefly check the contents of these data sets.
dataset = load_breast_cancer() #データセットの読み込み
X = pd.DataFrame(dataset.data, columns=dataset.feature_names) #特徴量をpandas.DataFrameに格納し、Xとする
y = dataset.target #目的変数をyとする
print(X.shape) #データの数×特徴量数
print(X.columns) #特徴量の名前
When the code above is entered, we receive the following output:
(569, 30)
Index(['mean radius', 'mean texture', 'mean perimeter', 'mean area',
'mean smoothness', 'mean compactness', 'mean concavity',
'mean concave points', 'mean symmetry', 'mean fractal dimension',
'radius error', 'texture error', 'perimeter error', 'area error',
'smoothness error', 'compactness error', 'concavity error',
'concave points error', 'symmetry error', 'fractal dimension error',
'worst radius', 'worst texture', 'worst perimeter', 'worst area',
'worst smoothness', 'worst compactness', 'worst concavity',
'worst concave points', 'worst symmetry', 'worst fractal dimension'],
dtype='object')
We can see that that data set contains the mean, standard deviation, and worst-case value for each of the following:
- Radius
- Texture
- Perimeter
- Area
- Smoothness
- Compactness
- Contour
- Concavity
- Intensity
- Number of contour concave points
- symmetry
- fractal dimension of the cells taken from the mammary mass
We also can observe the objective variable for whether the tumor is benign or malignant, where 1 is displayed for benign and 0 for malignant.
However, there is one issue with this data set. It appears that the features included are strongly correlated. As mentioned earlier, this can cause some of the results to be irrealistic. For example, the mean may be shown as 1, but the maximum value is 0.5. For this reason, we need to eliminate multicollinearity using methods such as Lasso.
Moving on with our example, we will use the features numbered 7, 20, 21, 24, 27, and 28 which represent the mean for each of the following:
- number of concave points in the contour
- worst radius
- worst texture
- worst smoothness
- worst number of concave points in the contour
- worst symmetry
We will use the following code for this step:
imp_features = [X.columns[7], X.columns[20], X.columns[21], X.columns[24], X.columns[27], X.columns[28]] #重要な特徴量を設定
imp_X = X[imp_features] #重要な特徴量のみのDataFrameを作成
print(imp_features) #重要な特徴量を表示
By entering this code, we obtain the following output:
['mean concave points', 'worst radius', 'worst texture', 'worst smoothness', 'worst concave points', 'worst symmetry']
The next step is to train the model. Since our goal is to interpret the contents of the model, we will use all of the data for training.
model = RandomForestClassifier() #機械学習モデルを設定
model.fit(imp_X, y) #学習
Once the model has been trained, it is time to plot the data so we will use the command, `plot_partial_dependence` to create the graph.
plt.rcParams['font.size'] = 18 #matplotlib.pyplot の文字サイズ設定
fig, ax = plt.subplots(figsize=(12, 6)) #matplotlib.pyplot の描画準備
fig.subplots_adjust(hspace=0.6, wspace=0.1) #間隔の設定
ax.set_title('Breast Cancer PDP of Random Forest Classifier') #タイトルの設定
disp_RFC = plot_partial_dependence(model, imp_X, imp_features, ax=ax) #プロット
for i in range(disp_RFC.lines_.shape[0]):
disp_RFC.axes_[i, 0].set_ylabel('Benign Probability') #縦軸ラベルの設定
Looking at the code above, a few changes have been made from the original so that the display does not collapse. However, it is still possible to make a minimal plot by using `plot_partial_dependence(trained model, input data, list of features to select)`.
plot_partial_dependence(model, imp_X, imp_features)
By default, only the PDP is drawn, but we can change it to draw only an ICE plot by changing the parameter to `kind=’individual’`. It is also possible to draw both plots by changing the parameter to `kind=’both’`.
Now, let us take a look at the actual ICE plot. In order to prevent the plot from being too crowded, we can limit the number of lines drawn by setting `subsample=0.2` so that only 20% of the lines are drawn.
fig, ax = plt.subplots(figsize=(12, 6)) #matplotlib.pyplot の描画準備
fig.subplots_adjust(hspace=0.6, wspace=0.1) #間隔の設定
ax.set_title('Breast Cancer ICE plots of Random Forest Classifier') #タイトルの設定
disp_RFC = plot_partial_dependence(model, imp_X, imp_features, ax=ax, kind='individual', subsample=0.2) #プロット
for i in range(disp_RFC.lines_.shape[0]):
disp_RFC.axes_[i, 0].set_ylabel('Benign Probability') #縦軸ラベルの設定
When plotting both the PDP and ICE plots at the same time, the default line colors and legends are difficult to read. For this reason, it may be helpful to change them manually. Scikit-learn is also working on changing the colors in their repository, so it will be more user-friendly in the future.
fig, ax = plt.subplots(figsize=(12, 8))
fig.subplots_adjust(hspace=0.7, wspace=0.1)
ax.set_title('Breast cancer PDP & ICE plots of Random Forest Classifier')
disp = plot_partial_dependence(model, imp_X, imp_X.columns, ax=ax, kind='both', subsample=0.2)
for i in range(disp.lines_.shape[0]):
for j in range(disp.lines_.shape[1]):
if disp.lines_[i,j,-1] != None:
disp.lines_[i,j,-1].set_color('0.2')
disp.axes_[i, j].legend(loc='upper center', bbox_to_anchor=(0.5, -0.4), borderaxespad=0, ncol=2, fontsize=10, labels=['PDP', 'ICE plot'], handles=[disp.lines_[i,j,-1], disp.lines_[i,j,0]])
if j==0:
disp.axes_[i, j].set_ylabel('Benign Probability')
The Importance of ICE Plots
The results from the PDP show that for many features, the likelihood of the AI determining a benign status (1.0) decreases as the value increases. However, there are some things that you can’t see without looking at the ICE plot. For example, if you compare the two plots for mean concave points, you will notice that the PDP shows a sharp drop in the benign rate around 0.05, while the ICE plot shows an increase at this value. It also seems that an average contour concavity count of over 0.05 does not generally indicate a low benignity rate.
Note: Depending on the ‘random_state’ of the RandomForest dataset and the ‘plot_partial_dependence’, your plots may differ from the ones shown in the article.
Let’s continue by looking at one more example using ICE and PDP plots. The figure below shows a plot of a regression model using a neural network to make a prediction using a dataset of Boston home prices.
In this plot, the horizontal axis is the distance from the employment facility, and the vertical axis is the predicted housing price. By observing just the PDP, the price tends to decrease as distance increases. However, the ICE plot shows many instances where the price either stabilizes or rises as distance increases. This may suggest that there is cheaper housing available for workers.
To summarize, we can avoid misunderstanding the model’s decision criteria by drawing both PDP and ICE plots.
Conclusion
Today, I briefly introduced PDP and ICE plots as methods to easily visualize how machine learning models arrive at their conclusions. I also showed several examples of their use in the Scikit-learn program.
In practice, this method is model-independent and can be applied to any model, making it a very versatile method. A few examples of real-world applications include SVMs, gradient boosting, neural, and neural networks.
In this article, we have used these plots for binary classification problems, but they can also be used in the same way for regression problems.
Thank you for reading my article today. I hope that you have learned something useful. If you are interested in investigating how machine learning models behave, please try using PDP and ICE plots.