多类分类器的混淆矩阵
A confusion matrix is a visual way to inspect the performance of a classification model. Metrics such as accuracy can be inadequate in cases where there are large class imbalances in the data, a problem common in machine learning applications for fraud detection. A confusion matrix can provide us with a more representative view of our classifier’s performance, including which specific instances it is having trouble classifying.
混淆矩阵是检查分类模型性能的一种直观方法。 在数据中存在大量类别不平衡的情况下,诸如准确性之类的指标可能不足,这是用于欺诈检测的机器学习应用程序中常见的问题。 混淆矩阵可以为我们提供分类器性能的更具有代表性的视图,包括分类器在哪些特定情况下遇到问题。
In this post we are going to illustrate two ways in which Comet’s confusion matrix can help debug classification models.
在本文中,我们将说明Comet的混淆矩阵可以用来帮助调试分类模型的两种方法。
For our first example, we will run an experiment similar to the one illustrated in this post on imbalanced data . We’re going to train a classifier to detect fraudulent transactions in an imbalanced dataset and use Comet’s confusion matrix to evaluate our model’s performance. In our second example we will cover classification on unstructured data with a large number of labels using the CIFAR100 dataset and a simple CNN model.
对于第一个示例,我们将针对不平衡数据运行类似于本文中所示的实验。 我们将训练一个分类器,以检测不平衡数据集中的欺诈交易,并使用Comet的混淆矩阵评估模型的性能。 在我们的第二个示例中,我们将使用CIFAR100数据集和一个简单的CNN模型覆盖具有大量标签的非结构化数据的分类。
In this example, we’re going to use the Credit Card Fraud Detection dataset from Kaggle to evaluate our classifier. This dataset is highly imbalanced, with only 492 fraudulent transactions present in a dataset with 284,807 transactions in total. Our model is a single fully connected layer with Dropout enabled. We’re going to train our model using the Adam optimizer for 5 epochs, with a batch size of 64 and use 20% of our dataset for validation.
在此示例中,我们将使用Kaggle的信用卡欺诈检测数据集来评估我们的分类器。 此数据集高度不平衡,数据集中仅存在492个欺诈交易,总共284,807个交易。 我们的模型是启用了Dropout的单个完全连接层。 我们将使用Adam优化器训练5个时期的模型,批处理大小为64,并使用数据集的20%进行验证。
def load_data(): raw_df = pd.read_csv( "https://storage.googleapis.com/download.tensorflow.org/data/creditcard.csv" ) return raw_dfdef preprocess(raw_df): df = raw_df.copy() eps = 0.01 df.pop("Time") df["Log Amount"] = np.log(df.pop("Amount") + eps) train_df, val_df = train_test_split(df, test_size=0.2) train_labels = np.array(train_df.pop("Class")) val_labels = np.array(val_df.pop("Class")) train_features = np.array(train_df) val_features = np.array(val_df) scaler = StandardScaler() train_features = scaler.fit_transform(train_features) val_features = scaler.transform(val_features) train_features = np.clip(train_features, -5, 5) val_features = np.clip(val_features, -5, 5) return train_features, val_features, train_labels, val_labelsOur model is a single fully connected layer with Dropout enabled. We’re going to train our model using the Adam optimizer for 5 epochs, with a batch size of 64 and use 20% of our dataset for validation.
我们的模型是启用了Dropout的单个完全连接层。 我们将使用Adam优化器训练5个时期的模型,批处理大小为64,并使用数据集的20%进行验证。
def build_model(input_shape, output_bias=None): if output_bias is not None: output_bias = tf.keras.initializers.Constant(output_bias) model = keras.Sequential( [ keras.layers.Dense(16, activation="relu", input_shape=(input_shape,)), keras.layers.Dropout(0.5), keras.layers.Dense(1, activation="sigmoid", bias_initializer=output_bias), ] ) model.compile( optimizer=keras.optimizers.Adam(lr=1e-3), loss=keras.losses.BinaryCrossentropy(), metrics=["accuracy"], ) return modelSince we’re using Keras as our modeling framework, Comet will automatically log our hyperparameters, and training metrics (accuracy and loss) to the web UI. At the end of every epoch we’re going to log a confusion matrix of the model’s predictions on our validation dataset using a custom Keras callback and Comet’s `log_confusion_matrix` function.
由于我们使用Keras作为建模框架,因此Comet会自动将我们的超参数和训练指标(准确性和损失)记录到Web UI。 在每个时期结束时,我们将使用自定义Keras回调和Comet的`log_confusion_matrix`函数将模型预测的混淆矩阵记录在验证数据集上。
class ConfusionMatrixCallback(Callback): def __init__(self, experiment, inputs, targets, cutoff=0.5): self.experiment = experiment self.inputs = inputs self.cutoff = cutoff self.targets = targets self.targets_reshaped = keras.utils.to_categorical(self.targets) def on_epoch_end(self, epoch, logs={}): predicted = self.model.predict(self.inputs) predicted = np.where(predicted < self.cutoff, 0, 1) predicted_reshaped = keras.utils.to_categorical(predicted) self.experiment.log_confusion_matrix( self.targets_reshaped, predicted_reshaped, title="Confusion Matrix, Epoch #%d" % (epoch + 1), file_name="confusion-matrix-d.json" % (epoch + 1), )def main(): experiment = Experiment(workspace=WORKSPACE, project_name=PROJECT_NAME) df = load_data() X_train, X_val, y_train, y_val = preprocess(df) confmat = ConfusionMatrixCallback(experiment, X_val, y_val) model = build_model(input_shape=X_train.shape[1]) model.fit( X_train, y_train, validation_data=(X_val, y_val), epochs=5, batch_size=64, callbacks=[confmat], )We see that our model is able to achieve a very high validation accuracy after a single epoch of training. This is misleading, since only 0.17% of the dataset has a positive label. We can achieve over 99% accuracy on this dataset by simply predicting a 0 label for any given transaction.
我们看到,我们的模型在经过单次训练后就能达到很高的验证准确性。 这是一种误导,因为只有0.17%的数据集具有正标签。 通过简单地预测任何给定交易的0标签,我们可以在此数据集上实现超过99%的准确性。
Let’s take a look at our Confusion Matrix to see what our real performance is like.
让我们看一下混淆矩阵,看看我们的真实表现是什么样子。
link to experiment) 链接到实验 )For our binary classification task, we see that after a single epoch of training our model produces 5 false positive and 29 false negative predictions. If our classifier was perfect, these values would be 0. By clicking on the cell with the false negatives, we can see the indices in our validation dataset that were incorrectly classified as not fraudulent.
对于我们的二进制分类任务,我们看到在训练了一个时期之后,我们的模型产生了5个假阳性预测和29个假阴性预测。 如果我们的分类器是完美的,那么这些值将为0。通过单击带有假阴性的单元格,我们可以看到验证数据集中被错误分类为非欺诈性的索引。
link to experiment) 链接到实验 )We can also see if our predictions improved over time by changing the epoch number in the dropdown selector.
我们还可以通过更改下拉选择器中的纪元数来查看我们的预测是否随着时间的推移而有所改善。
link to experiment) 链接到实验 )Lastly, in order to get an estimate of the per class performance, we can simply hover over the cell corresponding to that class. In Figure 5, we see that our classifier’s accuracy when it comes to detecting a true fraudulent transaction is closer to 83% rather than the reported validation accuracy of 99%.
最后,为了获得每个班级的表现,我们可以将鼠标悬停在与该班级相对应的单元格上。 在图5中,我们看到分类器检测到真实欺诈交易的准确性接近83%,而不是报告的验证准确性99%。
link to experiment) 链接到实验 )Comet makes it easy to deal with classification problems that depend on unstructured data. We’re going to use the CIFAR100 dataset and a simple CNN to illustrate how the ConfusionMatrix callback is used for these types of data.
通过Comet,可以轻松处理依赖于非结构化数据的分类问题。 我们将使用CIFAR100数据集和一个简单的CNN来说明如何将ConfusionMatrix回调用于这些类型的数据。
First, let’s fetch our dataset, and preprocess it using the built in convenience methods in Keras.
首先,让我们获取数据集,并使用Keras中内置的便捷方法对其进行预处理。
# Load CIFAR-100 data(input_train, target_train), (input_test, target_test) = cifar100.load_data()# Parse numbers as floatsinput_train = input_train.astype("float32")input_test = input_test.astype("float32")# Normalize datainput_train = input_train / 255input_test = input_test / 255target_train, target_test = tuple(map(lambda x: keras.utils.to_categorical(x), [target_train, target_test]))Next, we’ll define our CNN architecture for this task as well as our training parameters, such as batch size and number of epochs.
接下来,我们将为此任务定义CNN体系结构以及训练参数,例如批处理大小和时期数。
# Model configurationbatch_size = 128img_width, img_height, img_num_channels = 32, 32, 3loss_function = categorical_crossentropyno_classes = 100no_epochs = 100optimizer = Adam()verbosity = 1validation_split = 0.2interval = 10# Build the Modelmodel = Sequential()model.add(Conv2D(128, kernel_size=(3, 3), activation="relu", input_shape=input_shape))model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Conv2D(128, kernel_size=(3, 3), activation="relu"))model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Conv2D(64, kernel_size=(3, 3), activation="relu"))model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Flatten())model.add(Dense(256, activation="relu"))model.add(Dense(128, activation="relu"))model.add(Dense(no_classes, activation="softmax"))Finally, we’ll update our Confusion Matrix callback from the previous example so that we use a single Confusion Matrix object for the entire training process. This updated callback will only create a confusion matrix every Nth epoch, where N is controlled by the interval parameter.
最后,我们将从上一个示例中更新Confusion Matrix回调,以便在整个训练过程中使用单个Confusion Matrix对象。 此更新的回调将仅在第N个纪元创建一个混淆矩阵,其中N由interval参数控制。
class ConfusionMatrixCallback(keras.callbacks.Callback): def __init__(self, experiment, inputs, targets, interval): self.experiment = experiment self.inputs = inputs self.targets = targets self.interval = interval self.confusion_matrix = ConfusionMatrix( index_to_example_function=self.index_to_example, max_examples_per_cell=5, labels=LABELS, ) def index_to_example(self, index): image_array = self.inputs[index] image_name = "confusion-matrix-d.png" % index results = experiment.log_image(image_array, name=image_name) # Return sample, assetId (index is added automatically) return {"sample": image_name, "assetId": results["imageId"]} def on_epoch_end(self, epoch, logs={}): if (epoch + 1) % self.interval != 0: return predicted = self.model.predict(self.inputs) self.confusion_matrix.compute_matrix(self.targets, predicted) self.experiment.log_confusion_matrix( matrix=self.confusion_matrix, title="Confusion Matrix, Epoch #%d" % (epoch + 1), file_name="confusion-matrix-d.json" % (epoch + 1), )Using a single instance of the Confusion Matrix ensures that the fewest number of images are logged to Comet as it reuses them wherever possible over all epochs of training. These uploaded examples will be available inside our Confusion Matrix in the UI, and allow us to easily view the specific instances that our model is having difficulty classifying.
使用Confusion Matrix的单个实例可确保将最少数量的图像记录到Comet,因为它会在所有训练时期尽可能重用它们。 这些上载的示例将在UI的“混淆矩阵”中提供,并允许我们轻松查看模型难以分类的特定实例。
link to experiment) 链接到实验 )In Figure 6 we see that the Comet confusion matrix has trimmed down the total number of classes to the ones that the model is most confused about. i.e. the labels with the most misclassifications. By clicking on a cell, we can see examples of instances that have been misclassified. We can also change the what values are displayed in each cell of the matrix by changing the cell value to show the percent of correct predictions by row or column.
在图6中,我们看到Comet混淆矩阵将类别总数减少到了模型最容易混淆的类别。 即分类错误最多的标签。 通过单击一个单元格,我们可以看到已被错误分类的实例的示例。 我们还可以通过更改单元格值以按行或列显示正确预测的百分比来更改在矩阵的每个单元格中显示的值。
link to experiment) 链接到实验 ) link to experiment) 链接到实验 ) link to experiment) 链接到实验 ) link to experiment) 链接到实验 )In this example, Comet’s Confusion Matrix will upload the image examples as assets to the experiment. Alternatively, if your images are hosted somewhere else, or you are using assets such as audio, Comet’s confusion matrix can map the index of the classified example to the url of the corresponding asset.
在此示例中,Comet的混淆矩阵会将图像示例作为资产上传到实验。 另外,如果您的图像托管在其他地方,或者您使用的是音频之类的资产,则Comet的混淆矩阵可以将分类示例的索引映射到相应资产的url。
The Confusion Matrix API also allows the user to control the number of assets uploaded to Comet. We can either specify the maximum number of examples to be uploaded per cell, or provide a list of the classes that we are interested in comparing to the selected argument in the Confusion Matrix constructor.
Confusion Matrix API还允许用户控制上传到Comet的资产数量。 我们可以指定每个单元格要上传的示例的最大数量,或者提供与Confusion Matrix构造函数中的选定参数进行比较所需的类列表。
In this post, we’ve gone through an example of a classification task where our target data is highly imbalanced. We’ve shown how a metric like accuracy cannot accurately capture the true performance of our model and why visual tools like the confusion matrix can help us get a more granular understanding of our model performance across different classes.
在本文中,我们浏览了一个分类任务的示例,其中目标数据高度不平衡。 我们已经展示了诸如准确性之类的指标如何无法准确地捕获我们模型的真实性能,以及为什么混淆矩阵之类的可视工具可以帮助我们更深入地了解不同类别的模型性能。
You can also explore the experiment on imbalanced data in more detail:
您还可以更详细地探索关于不平衡数据的实验:
Comet Experiment for Imbalanced Data: Get access to the code used to generate these results on here.
不平衡数据的彗星实验:在此处访问用于生成这些结果的代码。
If you would like to run the code yourself, you can test it out in a Colab Notebook Colab Notebook for Imbalanced Data: here. Keep in mind that you will need a Comet account and a Comet API Key to run the notebook.
如果您想自己运行代码,则可以在Colab Notebook中针对不平衡数据进行测试: 此处 。 请记住,您将需要一个Comet帐户和一个Comet API密钥来运行笔记本。
We also demonstrated how Comet’s confusion matrix can be configured to work with unstructured data, and how it can provide examples of misclassified labels for easy model debugging.
我们还演示了如何将Comet的混淆矩阵配置为与非结构化数据一起使用,以及如何为未分类的标签提供示例,以便于模型调试。
Comet Experiment for CIFAR100: All the code necessary to reproduce these results can be found here.
CIFAR100的彗星实验:可以在此处找到再现这些结果所需的所有代码。
Colab Notebook for CIFAR100: here. We would recommend enabling the GPU on the notebook before running it. You can do this by navigating to Edit→Notebook Settings, and selecting GPU from the Hardware Accelerator drop-down.
适用于CIFAR100的Colab笔记本: 此处 。 我们建议在运行笔记本电脑之前启用GPU。 您可以通过导航到“编辑”→“笔记本设置”,然后从“硬件加速器”下拉列表中选择GPU来执行此操作。
*Note:* Comet’s Confusion Matrix also supports R. Check out the example here
*注意:*彗星的混淆矩阵也支持R。请在此处查看示例
Originally published at https://www.comet.ml on September 1, 2020.
最初于 2020年9月1日 发布在 https://www.comet.ml 。
翻译自: https://medium.com/swlh/debugging-classifiers-with-confusion-matrices-6210c77c5d65
多类分类器的混淆矩阵
相关资源:多分类问题中混淆矩阵(Confusion Matrix)的Matlab画法