kaggle 比赛分类

科技2025-03-20 45

kaggle 比赛分类

Using deep learning to identify melanomas from skin images and patient meta-data

使用深度学习从皮肤图像和患者元数据中识别黑色素瘤

Kaggle, SIIM, and ISIC hosted the SIIM-ISIC Melanoma Classification competition on May 27, 2020, the goal was to use image data from skin lesions and the patients meta-data to predict if the skin image had a melanoma or not, here is a small introduction to the task from the hosts:

K aggle， SIIM和ISIC于2020年5月27日举办了SIIM-ISIC黑色素瘤分类比赛，目标是使用来自皮肤病变的图像数据和患者元数据来预测皮肤图像是否患有黑色素瘤。是主机对任务的简短介绍：

Skin cancer is the most prevalent type of cancer. Melanoma, specifically, is responsible for 75% of skin cancer deaths, despite being the least common skin cancer. The American Cancer Society estimates over 100,000 new melanoma cases will be diagnosed in 2020. It’s also expected that almost 7,000 people will die from the disease. As with other cancers, early and accurate detection — potentially aided by data science — can make treatment more effective.

皮肤癌是最普遍的癌症类型。尽管是最不常见的皮肤癌，但黑色素瘤仍可导致75％的皮肤癌死亡。美国癌症协会估计，到2020年将诊断出100,000多例新的黑色素瘤病例。还预计将有7,000人死于这种疾病。与其他癌症一样，在数据科学的帮助下，早期而准确的检测可以使治疗更加有效。

Currently, dermatologists evaluate every one of a patient’s moles to identify outlier lesions or “ugly ducklings” that are most likely to be melanoma. Existing AI approaches have not adequately considered this clinical frame of reference. Dermatologists could enhance their diagnostic accuracy if detection algorithms take into account “contextual” images within the same patient to determine which images represent a melanoma. If successful, classifiers would be more accurate and could better support dermatological clinic work.

当前，皮肤科医生评估患者的每一颗痣，以找出最有可能是黑色素瘤的异常病灶或“丑小鸭”。现有的AI方法尚未充分考虑此临床参考框架。如果检测算法考虑到同一患者内的“上下文”图像来确定哪些图像代表黑色素瘤，则皮肤科医生可以提高其诊断准确性。如果成功，分类器将更加准确，并且可以更好地支持皮肤科临床工作。

I took part in the competition and after about 2 months and about 200 experiments got a bronze medal finishing at 241st among 3314 teams (Top 8%), during the competition I also published two kernels one about visualizing data augmentations and another about using SHAP to explain models predictions.

我参加了比赛，经过大约2个月的时间，大约有200个实验在3314个团队中排名第241位(排名前8％)获得铜牌，在比赛中我还发布了两个内核，一个关于可视化数据增强，另一个关于使用SHAP解释模型的预测。

关于数据 (About the data)

Between images, TFRecords, and CSV files the complete data was about 108GB (33126 samples for the training set and 10982 for the test set), most of the images had high resolution, handling all this alone was a challenge.At the image side, we had 584 images that were melanomas and 32542 images that were not, here is an example:

在图像，TFRecords和CSV文件之间，完整数据约为108GB(训练集为33126个样本，测试集为10982个样本)，大多数图像具有高分辨率，仅处理所有这些都是一个挑战。我们有584张黑色素瘤图像和32542张不是黑色素瘤的图像，这是一个示例：

Left are images without melanoma and right are images with melanomas. 左是没有黑素瘤的图像，右是有黑素瘤的图像。

As you can see if might be pretty tricky to classify those images correctly.

如您所见，对这些图像进行正确分类是否很棘手。

We also had the patients meta-data, these were basically some characteristics related to the patient:

我们还拥有患者元数据，这些基本上是与患者相关的一些特征：

sex - the sex of the patient (when unknown, will be blank).

sex患者的性别(未知时为空白)。

age_approx - approximate patient age at the time of imaging.

age_approx成像时的近似患者年龄。

anatom_site_general_challenge - location of the imaged site.

anatom_site_general_challenge成像站点的位置。

diagnosis - detailed diagnosis information.

diagnosis -详细的诊断信息。

benign_malignant - indicator of malignancy of the imaged lesion.

benign_malignant成像病变的恶性指标。

So, this all seems to be very interesting, it is basically why I joined the competition, and also to have an opportunity to do some more experimentations with Tensorflow, TPUs, and computer vision.

因此，这一切似乎非常有趣，这基本上就是我参加比赛的原因，并且还有机会对Tensorflow，TPU和计算机视觉进行更多的实验。

我如何应对挑战 (How I approached the challenge)

My approach can be summarized by these topics:

我的方法可以归纳为以下主题：

Pre-process

前处理 Modeling

造型Ensembling

组装

前处理(Pre-process)

The pre-processing step was very straightforward the image data already had a very good resolution (1024x1024) so in order to be able to use TPUs with a good number of images per batch (64 ~ 512) and big models like EfficientNets (B0 ~ B7) all I had to do was to create auxiliary datasets with the same images but with different resolutions (ranging from 128x128 to 768x768) fortunately those datasets were kindly provided by one of the participants.For the tabular data, no pre-processing was done, the data was already very simple, I did some experiments using features extracted from the images but it did not work very well.

预处理步骤非常简单，图像数据已经具有非常好的分辨率(1024x1024)，因此，为了能够使用每批具有大量图像(64〜512)的TPU和像EfficientNets (B0〜 B7)我要做的就是创建具有相同图像但分辨率不同(范围从128x128到768x768)的辅助数据集，幸运的是这些数据集是由其中一位参与者提供的。对于表格数据，没有进行任何预处理，数据已经非常简单，我使用从图像中提取的特征进行了一些实验，但是效果不是很好。

造型 (Modeling)

Let’s move to the most interesting part, I will describe the aspects of my best single model and then talk about the decisions behind some of those.

让我们进入最有趣的部分，我将描述最佳单一模型的各个方面，然后讨论其中一些决策。

The model architecture was an EfficientNetB5 using only image data, the images had 512x512 resolution, I also used a cosine annealing learning rate with hard restarts and warmup with early stopping, I trained for 100 epochs with a total of 9 cycles, each cycle going from 1e-3 down to 1e-6 and a batch size of 128. With this model, I achieved 0.9470 AUC on the public leaderboard and 0.9396 AUC on the private leaderboard.

模型体系结构是仅使用图像数据的EfficientNetB5 ，图像具有512x512的分辨率，我还使用了余弦退火学习速率以及硬重启和提前停止预热功能，我训练了100个时期，共9个周期，每个周期从1e-3降至1e-6，批处理大小为128 。使用此模型，我在公共排行榜上获得了0.9470的AUC ，在私人排行榜上获得了0.9396的AUC 。

For data augmentation I used basic functions, my complete stack was a mix from shear, rotation, crop, flips, saturation, contrast, brightness, and cutout, you can check the code here. For inference, I used a lighter version of the same stack, removing shear and cutout.Here are a few samples of augmented images:

对于数据增强，我使用了基本功能，我完整的堆栈是剪切，旋转，裁切，翻转，饱和度，对比度，亮度和剪切的混合体，您可以在此处查看代码。为了进行推断，我使用了同一堆栈的较浅版本，删除了剪切和剪切，以下是增强图像的一些示例：

Augmented training images. 增强训练图像。

This is how the model looked like (in Tensorflow):

这是模型的样子(在Tensorflow中)：

def model_fn(input_shape=(256, 256, 3)): input_image = L.Input(shape=input_shape, name='input_image') base_model = efn.EfficientNetB5(input_shape=input_shape, weights='noisy-student', include_top=False) x = base_model(input_image) x = L.GlobalAveragePooling2D()(x) output = L.Dense(1, activation='sigmoid', name='output')(x) model = Model(inputs=input_image, outputs=output) return model Learning rate schedule (Y-axis is the LR and X-axis is the number of epochs) 学习率进度表(Y轴是LR，X轴是历时数)

Ok now let’s break down each component.

好的，现在让我们分解每个组件。

为什么选择EfficientNet？ (Why EfficientNet?)

As you can see by my model backlog I have experimented with a lot of different models but after a while I kept only EfficientNet experiments, to be honest, a was also a little surprised by the how better EfficientNets performance was here, usually, some other architectures would have similar results like InceptionResNetv2, SEResNext or some variations of ResNets or DenseNets, Before the competition, I had very high hopes for the recent BiT models from Google but after many experiments with BiT I gave up with poor results.

您可以从我的模型积压清单中看到，我已经尝试了许多不同的模型，但是过一会儿，我只保留了EfficientNet实验，老实说，这里的EfficientNets性能如何(通常是其他一些)也让我有些惊讶架构会产生类似的结果，例如InceptionResNetv2，SEResNext或ResNets或DenseNets的某些变体。在竞赛之前，我对Google的最新BiT模型寄予了很高的期望，但经过多次BiT实验后，我放弃了较差的结果。

For this specific experiment I got better results with the B5 version of EfficientNet but I got very similar results from almost all versions (B3 to B6), bigger version B7 is more difficult to train, it may require images with higher resolution and is easier to overfit with so many parameters, and smaller versions (B0 to B2) usually perform better with smaller resolutions which seem to yield slight worse results for this task.Between the classic ImageNet weights and the improved NoisyStudent, the latter had better results.

对于这个特定的实验，我使用B5版本的EfficientNet获得了更好的结果，但是几乎从所有版本(B3到B6)都得到了非常相似的结果，更大版本的B7更加难以训练，可能需要更高分辨率的图像并且更易于如此多的参数会导致过拟合，并且较小的版本(B0至B2)通常在较小的分辨率下效果更好，这似乎会为该任务带来稍差的结果。在经典ImageNet权重和改进的NoisyStudent之间，后者的效果更好。

As you can see a very basic model with just an average pooling on top of the CNN backbone was my best model. Finally, I used binary cross-entropy with label smoothing of 0.05 as the optimization loss.

如您所见，我的最佳模型是一个非常基本的模型，在CNN骨干网之上只有一个平均池。最后，我使用标签平滑度为0.05的二进制交叉熵作为优化损失。

Single fold training metrics from this model. 此模型的单次训练指标。

You may think that 100 epochs are a lot, and indeed it would be, but I was sampling each batch from two different datasets, a regular one and another with only malignant images, this made the model converge much faster, so I had to make each epoch use only a fraction of the total data (about 10%), roughly here every 10 epochs would be equivalent to 1 regular epoch.

您可能会认为确实有100个纪元，但是我当时是从两个不同的数据集中取样每个批次，一个数据集是一个常规数据集，另一个数据集只有一个恶性图像，这使模型收敛得更快，所以我不得不每个时期仅使用总数据的一小部分(约10％)，大约每10个时期相当于一个常规时期。

评估和比较模型 (Evaluating and comparing models)

An important part of being effective at Kaggle competitions or any other machine learning project is to be able to quickly iterate over experiments and compare which one is the best, this will save you a lot of time and will help you focus on the most fruitful ideas. Since the early stages of the competition I developed a way to evaluate and compare my experiments, this is how it looked like for a random experiment:

在Kaggle竞赛或任何其他机器学习项目中有效的重要部分是能够快速遍历实验并比较哪一个是最好的，这将节省大量时间，并帮助您专注于最富有成果的想法。自比赛开始以来，我就开发了一种评估和比较实验的方法，这是随机实验的样子：

Fig 1: metrics across folds. 图1：跨折指标。 Fig 2: metrics of different data slices across folds. 图2：跨折的不同数据切片的指标。

As you can see with information like this becomes very simples to compare models between folds and experiments, also with “Fig 2” image I can evaluate the model’s performance on different aspects of the data, this is very important to identify possible biases from the model and address them early on, and to keep in mind possible improvements, and at each portion of the data which model is better (this may help with ensembling latter).

正如您所看到的那样，在折叠和实验之间比较模型变得非常容易，同样通过“图2”图像，我可以评估模型在数据不同方面的性能，这对于从模型中识别出可能的偏差非常重要。并尽早解决这些问题，并牢记可能的改进，并且在数据的每个部分中哪个模型更好(这可能有助于汇总)。

组装 (Ensembling)

For ensembling, I developed a script to brute force try many ensembling techniques, among these were regular, weighted, power, ranked, and exponential log average. In the end, the combination pointed by the script as having the best CV was also my best chosen submission.I have used 1x EfficientNetB4 (384x384), 3x EfficientNetB4 (512x512), 1x EfficientNetB5 (512x512), and 2x XGBM models trainend using only meta-data.

对于合奏，我开发了一个脚本以蛮力尝试许多合奏技术，其中包括常规，加权，幂，排名和指数对数平均。最后，脚本所指出的具有最佳简历的组合也是我的最佳选择。我仅使用了1x EfficientNetB4(384x384)，3x EfficientNetB4(512x512)，1x EfficientNetB5(512x512)和2x XGBM模型训练端元数据。

工作原理总结 (Summary of what worked)

EfficientNet architectures (B3 to B6) with just an average pooling layer.

仅具有平均池化层的EfficientNet体系结构(B3至B6)。Medium image resolutions (256x256 to 768x768).

中等图像分辨率(256x256至768x768)。 Learning rate schedules with a warmup (regular cosine annealing and also cyclical with warm restarts).

通过预热(定期余弦退火，以及随着热重启而周期性地进行)来安排学习率。 Ensembling image models (CNNs) with meta-data only models (XGBM).

将图像模型(CNN)与仅元数据模型(XGBM)集成在一起。 Augmentation helped a lot here, although was a little tricky to find the best combination.

增强在这里起到了很大的作用，尽管要找到最佳的组合有些棘手。 Cutout helped fighting overfitting, I was close to getting MixUp to work but there was not enough time.

抠图有助于解决过度拟合的问题，我几乎可以混合使用MixUp，但是时间不够。 Batch sampling played a very important role in the heavily unbalanced data.

批量采样在严重失衡的数据中起着非常重要的作用。 Using TPUs was crucial, having previous experience with Tensorflow API and modules helped me a lot.

使用TPU至关重要，拥有Tensorflow API和模块的先前经验对我有很大帮助。 TTA (test time augmentation) gave a good score boost.

TTA(测试时间增加)大大提高了分数。

有什么可以改善的？ (What could have improved?)

Comparing my models performance to the top team’s I could see that I had strong models, maybe going for diversity instead of only CV score on my ensembles could give a boost to final scores.

将我的模型表现与顶级团队的表现进行比较，我可以看到我拥有强大的模型，也许是追求多样性，而不是仅仅通过我的乐队的CV得分来提高最终得分。 Maybe training a few more epochs with pseudo-labels could improve a little.

也许用伪标签训练更多的时期可能会有所改善。

结论 (Conclusion)

You can view all my experiments on the GitHub repository I created for this competition, there you will find all my experiments and also nice compilations of research materials I collected during the competition.I also wrote a small overview at Kaggle.There is so much more to be said about the competition and you might have a few questions as well, in any case, feel free to reach out at my LinkedIn.

您可以在我为这次比赛创建的GitHub存储库上查看我的所有实验，在这里您可以找到我的所有实验以及我在比赛中收集的很好的研究材料汇编。我还在Kaggle上写了一个小概述。关于比赛，您可能还会有几个问题，无论如何，请随时与我的LinkedIn联系。

翻译自: https://medium.com/analytics-vidhya/melanoma-classification-getting-a-medal-on-a-kaggle-competition-4e4ebf1a16b9

kaggle 比赛分类

相关资源：皮肤检测数据集-Skin Segmentation Data Set_UCI

Processed: 0.014, SQL: 8