roc auc 多分类

科技2025-04-01 36

roc auc 多分类

In machine learning, one essential learning objective is to classify data into groups. Although classification can include unsupervised learning (e.g., clustering), in most cases, our tasks involve known labels and thus we’re conducting supervised learning to classify our data. Your model will generate predicted labels, which will allow us to compare whether our prediction is accurate or not. When the predicted label and the true label match, we say the prediction is correct, and apparently when they don’t, we say the prediction is wrong.

在机器学习中，一项重要的学习目标是将数据分类。尽管分类可以包括无监督的学习(例如，聚类)，但是在大多数情况下，我们的任务涉及已知的标签，因此我们正在进行有监督的学习以对数据进行分类。您的模型将生成预测标签，这将使我们能够比较我们的预测是否准确。当预测的标签和真实的标签匹配时，我们说预测是正确的，显然当它们不匹配时，我们说预测是错误的。

To put our discussion into context, suppose that we have clinical data for some subjects whose diagnoses on diabetes are known. Just a quick disclaimer before we proceed — these data are not real data and they don’t constitute any medical advice.

为了使我们的讨论更全面，假设我们拥有一些糖尿病诊断已知的受试者的临床数据。在我们继续之前，请快速声明一下-这些数据不是真实数据，它们不构成任何医疗建议。

Based on these data, we build a logistic regression model to predict whether people have diabetes or not based on their fasting glucose level. In the table below, the fasting glucose level is expressed as mg/dL. The diabetic_clinical column shows you the clinical diagnosis while the diabetic_predicted shows you the prediction result from the logistic regression model. Based on the clinical and predicted results, we can simply know whether a prediction is correct or not, as indicated by the last column.

基于这些数据，我们建立了逻辑回归模型，根据空腹血糖水平预测人们是否患有糖尿病。在下表中，空腹葡萄糖水平表示为mg / dL。 diabetic_clinical列显示了临床诊断，而diabetic_predicted列显示了逻辑回归模型的预测结果。根据临床和预测结果，我们可以简单地知道预测是否正确，如最后一栏所示。

混淆矩阵 (Confusion Matrix)

Based on this binary evaluation outcome (correct vs. wrong) in relation to the true labels, we can build the 2 x 2 confusion matrix, as shown below.

基于与真实标签相关的二进制评估结果(正确与错误)，我们可以构建2 x 2混淆矩阵，如下所示。

Confusion Matrix (by Author) 混淆矩阵(作者)

Correct Predictions:

正确的预测：

True Positive (TP): both predicted and true labels are positive. In the example shown above, those people have diabetes and they’re predicted to be diabetic using their fasting glucose level.

真阳性(TP) ：预测标签和真标签均为阳性。在上面显示的示例中，这些人患有糖尿病，使用空腹血糖水平可以预测他们患有糖尿病。

True Negative (TN): both predicted and true labels are negative. Those are non-diabetic subjects, who are predicted to be non-diabetic.

真负(TN) ：预测标签和真标签均为负。这些是非糖尿病受试者，预计是非糖尿病受试者。

Wrong Predictions:

错误的预测：

False Positive (FP): the predicted label is positive, while the true label is negative. Those are predicted to be diabetic, but they’re actually not.

误报(FP) ：预测标签为正，而真实标签为负。那些被预测为糖尿病，但实际上并非如此。

False Negative (FN): the predicted label is negative, while the true label is positive. Those are diabetic subjects who’re predicted to be non-diabetic.

假阴性(FN) ：预测标签为阴性，而真实标签为阳性。这些是糖尿病患者，预计是非糖尿病患者。

We can derive many metrics essential for classification model evaluation. Some commonly used ones are listed below, and you can find a full list of generated metrics at the Wikipedia page.

我们可以得出许多对于分类模型评估必不可少的指标。下面列出了一些常用的指标，您可以在Wikipedia页面上找到生成的指标的完整列表。

Accuracy: the number of correct predictions divided by the total number of predictions: (TP + TN) / (TP + TN + FP + FN). In the example, the accuracy of our model is 0.7 (i.e., 7 / 10).

准确性：正确预测的数目除以预测的总数： (TP + TN) / (TP + TN + FP + FN) 。在该示例中，我们模型的精度为0.7(即7/10)。

True Positive Rate (TPR, Sensitivity or Recall): the number of true positive labels divided by the number of positive labels: TP / (TP + FN). The TPR of our model is 0.8 (i.e., 4 / 5).

真实阳性率(TPR，灵敏度或召回率) ：真实阳性标签数除以阳性标签数： TP / (TP + FN) 。我们模型的TPR为0.8(即4/5)。

False Positive Rate (FPR, Fall-out): the number of false positive labels divided by the number of negative labels: FP / (FP + TN). The FPR of our model is 0.4 (i.e., 2 / 5).

误报率(FPR，尘埃落差) ：误报标签数除以否定标签数： FP / (FP + TN) 。我们模型的FPR为0.4(即2/5)。

True Negative Rate (Specificity): the number of true negative labels divided by the number of negative labels: TN / (FP + TN). You can notice that specificity = 1 — FPR. The specificity of our model is 0.6 (i.e., 3 / 5).

真实阴性率(特异性) ：真实阴性标签数除以阴性标签数： TN / (FP + TN) 。您会注意到specificity = 1 — FPR 。我们模型的特异性为0.6(即3/5)。

接收器工作特性(ROC) (Receiver Operating Characteristic (ROC))

The Receiver Operating Characteristic curve is a graph showing you how your classification model performs at all thresholds. The following graph is a hypothetical ROC curve.

接收器工作特性曲线是一张图表，向您显示分类模型在所有阈值下的性能。下图是假设的ROC曲线。

ROC Example Graph (by Author) ROC示例图(作者) In an ROC curve graph, the x-axis is the FPR, while the y-axis is the TPR.

在ROC曲线图中，x轴是FPR，而y轴是TPR。 For a perfect model, its FPR is 0 and its TPR is 1. By contrast, for the worst model, its FPR is 1 and its TPR is 0. The ROC curves for these two extreme scenarios are shown in the graph.

对于理想模型，其FPR为0，TPR为1。相比之下，对于最差模型，其FPR为1，TPR为0。这两个极端情况的ROC曲线如图所示。

In a typical model, we should see real curves. Specifically, by varying the thresholds, our model will produce different TPR and FPR, and these points can be plotted on this graph. Connecting these points, we can generate an ROC curve.

在典型模型中，我们应该看到真实的曲线。具体来说，通过更改阈值，我们的模型将产生不同的TPR和FPR，并且这些点可以绘制在此图上。连接这些点，我们可以生成ROC曲线。

To put the ROC curve into the context of the diabetes diagnosis example, let’s suppose that we can have different thresholds for predicting the diabetes diagnosis. As you can expect, if we have an extreme low threshold, we will classify all subjects as diabetic. Although we can get a TPR of 1, but the FPR will become 1 too. Considering some moderate thresholds, we should be able to find many different combinations of TPR and FPR.

为了将ROC曲线用于糖尿病诊断示例中，我们假设可以有不同的阈值来预测糖尿病诊断。如您所料，如果阈值过低，我们会将所有受试者归类为糖尿病。虽然我们可以将TPR设为1，但FPR也将变为1。考虑到一些中等的阈值，我们应该能够找到TPR和FPR的许多不同组合。

Because the data shown are small is size, if you have many more data points, by varying the threshold, you should create more points as a function of TPR and FPR. If we connect all of these points and smooth the curve, we’re getting the ROC curve for the particular model that we’re building.

因为显示的数据很小，所以如果您有更多的数据点，则通过更改阈值，应根据TPR和FPR创建更多的点。如果我们将所有这些点连接起来并平滑曲线，那么我们将获得正在构建的特定模型的ROC曲线。

ROC曲线下面积(AUC) (Area Under ROC curve (AUC))

How can we quantify the performance of our model? As discussed above, we say that the ROC is to show our model performs, but how can we exactly evaluate the performance with the ROC curves?

我们如何量化模型的性能？如上所述，我们说ROC是要显示模型的性能，但是如何用ROC曲线准确评估性能呢？

If you compare the two extreme scenarios (perfect vs. worst), you’ll probably notice that the areas under the curve seems to mean something. Your guess is exactly right. Suppose you consider a typical model, in most cases, your model’s TPR should be larger than the FPR at various thresholds. In this case, you’ll see your ROC curve above the diagonal line of the graph.

如果比较两种极端情况(完美与最差)，您可能会注意到曲线下的面积似乎有些含义。您的猜测是完全正确的。假设您考虑一个典型模型，则在大多数情况下，在各种阈值下，模型的TPR应该大于FPR。在这种情况下，您将在图形的对角线上方看到ROC曲线。

More importantly, if your model is better, you should see greater differences between the TPR and the FPR, which drives the curve towards the perfect model situation. The following graph shows you some possible scenarios that you may encounter with realistic models.

更重要的是，如果您的模型更好，则应该在TPR和FPR之间看到更大的差异，这将使曲线趋向于完美的模型状态。下图显示了现实模型可能遇到的一些可能的情况。

AUC for ROC curves (by Author) ROC曲线的AUC(作者) The gray area depicts the area under the ROC curve.

灰色区域表示ROC曲线下的区域。

The AUC of Model 2 is greater than the AUC of Model 1. We typically say that Model 2 outperforms Model 1. In other words, we can roughly equate the performance of classifying models to their respective AUC.

模型2的AUC大于模型1的AUC。我们通常说模型2优于模型1。换句话说，我们可以将模型分类的性能大致等同于它们各自的AUC。

The diagonal line depicts the the random classifier for a binary classification task. Here’s additional discussion regarding why the AUC for such random guessing is 0.5 for both TPR and FPR.

对角线描绘了用于二进制分类任务的随机分类器。关于TPR和FPR为何这种随机猜测的AUC为什么为0.5，这是另外的讨论。

你走之前 (Before You Go)

Although ROC and AUC are often used to evaluate classification task performance, they’re not always preferred. The major reason is that it doesn’t consider the actual question under examination. For instance, different models can have similar AUC, but they can have different shapes, which means that the models have different combinations of TPR and FPR.

尽管经常使用ROC和AUC来评估分类任务的性能，但并不总是首选它们。主要原因是它没有考虑实际的考题。例如，不同的模型可以具有相似的AUC，但是它们可以具有不同的形状，这意味着模型具有TPR和FPR的不同组合。

Thus, we should consider other factors to pick the desired model. Questions to ask yourself can include. Do you care more about correctly identifying positive cases? Or do you care more about correctly identifying negative cases?

因此，我们应该考虑其他因素来选择所需的模型。要问自己的问题可以包括。您是否更关心正确识别阳性病例？还是您更关心正确识别负面案例？

For your reference, here’s a brief discussion on the ROC and AUC, which I find very useful for beginners.

供您参考，这里是关于ROC和AUC的简短讨论，我认为这对初学者非常有用。

翻译自: https://towardsdatascience.com/an-introduction-to-the-roc-auc-in-classification-tasks-94c2a147dd04

roc auc 多分类

Processed: 0.013, SQL: 8