Mxnet (34): 语义分割数据集处理

科技2022-07-15 120

前面的对象检测中，我们使用矩形边框标记和预测图像中的对象。这里将使用语义分割，通过将图像分割成具有不同语义类别的区域。这些语义区域在像素级别标记和预测对象。对象检测相比，语义分割使用像素级边界标记区域，从而显着提高了精度。

1. 图像分割与实例分割

在计算机视觉领域, 语义分割有两种重要的方法: 图像分割和实例分割。我们需要将如下概念与语义分割作以区分:

图像分割将图像分为几个组成部分。这个方法通常使用图像之间的相关性。在训练期间，图像像素不需要标签。然而在预测时，这个模式不能确保分割的区域具有我们想要的语义。以上图举例，图像分割可能会将狗分为两个部分：狗的眼睛和嘴，因为最要颜色时黑色，另一部分，为够的其他部分，应为主要颜色为黄色。实例分割也称为边检侧边分割。这个模式尝试识别图像中的每一个对象实例的像素级的区域。不同于图像分割, 实例分割不仅仅区分语义，而是区分不同实例对象。如果图中包含两只狗，那么实例分割将区分哪些像素属于哪些狗。

2. Pascal VOC2012语义分割数据集

在语义分割领域，Pascal VOC2012是一个重要的数据集。为了更好地理解该数据集，我们必须首先导入实验所需的包或模块。

from d2l import mxnet as d2l from mxnet import gluon, image, np, npx, autograd, init from mxnet.gluon import nn from plotly import graph_objs as go, express as px from plotly.subplots import make_subplots from IPython.display import Image import plotly.io as pio import os pio.kaleido.scope.default_format = "svg" npx.set_np() def show_imgs(imgs, num_rows=2, num_cols=4, scale=0.8, labels=None) : fig = make_subplots(num_rows, num_cols) for i in range(num_rows): for j in range(num_cols): z = imgs[num_cols*i+j].asnumpy() fig.add_trace(go.Image(z=z),i+1,j+1) if labels is not None: x0, y0, x1, y1 = labels[num_cols*i+j][0][1:5] * edge_size fig.add_shape(type="rect",x0=x0,y0=y0,x1=x1,y1=y1,line=dict(color="white"),row=i+1, col=j+1) fig.update_xaxes(visible=False, row=i+1, col=j+1) fig.update_yaxes(visible=False, row=i+1, col=j+1) img_bytes = fig.to_image(format="png", scale=scale, engine="kaleido") return img_bytes

下载数据。

d2l.DATA_HUB['voc2012'] = (d2l.DATA_URL + 'VOCtrainval_11-May-2012.tar', '4e443f8a2eca6b1dac8a6c57641b67dd40621a49') voc_dir = d2l.download_extract('voc2012', 'VOCdevkit/VOC2012')

下载的问题解压以后根据路径分为输入图像和标签。图像和标签一样都是图片，并且尺寸相同。在标签中，具有相同颜色的像素属于相同的语义类别。下面定义了一个方法，将所有的输入图像和标签读取到存储器中。

def read_voc_images(voc_dir, is_train=True): txt_path = os.path.join(voc_dir, 'ImageSets', 'Segmentation', 'train.txt' if is_train else 'val.txt') with open(txt_path, 'r') as f: images = f.read().split() features, labels = [], [] for i, name in enumerate(images): features.append(image.imread(os.path.join(voc_dir, 'JPEGImages', f'{name}.jpg'))) labels.append(image.imread(os.path.join(voc_dir, 'SegmentationClass', f'{name}.png'))) return features, labels train_features, train_labels = read_voc_images(voc_dir)

选取前5个图片以及标签展示

imgs = train_features[:5] + train_labels[:5] Image(show_imgs(imgs, 2, 5, 1.4))

接下来，分别列出标签类型以及对应其图片上的颜色的RGB值

VOC_COLORMAP = np.array([[0, 0, 0], [128, 0, 0], [0, 128, 0], [128, 128, 0], [0, 0, 128], [128, 0, 128], [0, 128, 128], [128, 128, 128], [64, 0, 0], [192, 0, 0], [64, 128, 0], [192, 128, 0], [64, 0, 128], [192, 0, 128], [64, 128, 128], [192, 128, 128], [0, 64, 0], [128, 64, 0], [0, 192, 0], [128, 192, 0], [0, 64, 128]]).astype(np.int32) VOC_CLASSES = ['background', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'potted plant', 'sheep', 'sofa', 'train', 'tv/monitor'] BASE_VECTOR = np.array([256*256, 256, 1]) def build_colormap2label(): """通过一个一维的numpy数组存储RGB和labels的index之间的映射关系""" colormap2label = np.zeros(256 ** 3) for i, colormap in enumerate(VOC_COLORMAP*BASE_VECTOR): colormap2label[colormap.sum()] = i return colormap2label def voc_label_indices(colormap, colormap2label): """通过RGB获取labels""" colormap = colormap.astype(np.int32) colormap = colormap*BASE_VECTOR idx = colormap.sum(axis=2) return colormap2label[idx]

举个例子，第一张图片中的飞机标签的index为1，而背景一直是0

y = voc_label_indices(train_labels[0], build_colormap2label()) y[105:115, 130:140], VOC_CLASSES[1]

3. 图像预处理

在语义分割中，要求将预测的像素类别重新映射会原始大小的输入图像。想要精确的做到这一点非常困难，尤其是在具有不同语义的分段区域中。为避免此类问题，我们通过裁剪图像的来达到尺寸，而不会缩放。就是说，通过随机裁剪的方法从输入图像及标签中获取相同的裁剪区域。

def voc_rand_crop(feature, label, size=(300, 200)): feature, rect = image.random_crop(feature, size) label = image.fixed_crop(label, *rect) return feature, label imgs = [] for _ in range(5): imgs += voc_rand_crop(train_features[0], train_labels[0]) Image(show_imgs(imgs[::2] + imgs[1::2], 2, 5, 1.4))

4. 定义语义分割类

通过继承Gluon的Dataset类创建一个用于定制语义分割的类。通过使用__getitem__功能，可以根据index在数据集中根据索引获取到输入图像。由于数据集中的图像有可能小于设定的随机裁剪的大小，因此设置filter函数用于过滤掉这些图像。另外定义了normalize_image对输入图像的三个RGB做归一化处理。

class VOCSegDataset(gluon.data.Dataset): def __init__(self, crop_size, voc_dir, is_train): self.rgb_mean = np.array([0.485, 0.456, 0.406]) self.rgb_std = np.array([0.229, 0.224, 0.225]) self.crop_size = crop_size features, labels = read_voc_images(voc_dir, is_train=is_train) self.features = [self.normalize_image(feature) for feature in self.filter(features)] self.labels = self.filter(labels) self.colormap2label = build_colormap2label() print('读取' + str(len(self.features)) + '个样本') def normalize_image(self, img): return (img.astype('float32') / 255 - self.rgb_mean) / self.rgb_std def filter(self, imgs): return [img for img in imgs if ( img.shape[1] >= self.crop_size[0] and img.shape[0] >= self.crop_size[1])] def __getitem__(self, idx): feature, label = voc_rand_crop(self.features[idx], self.labels[idx], self.crop_size) return (feature.transpose(2, 0, 1), voc_label_indices(label, self.colormap2label)) def __len__(self): return len(self.features)

5. 读取数据集

使用定制VOCSegDataset类，我们创建训练集和测试集实例。我们假设随机裁剪操作输出的图像形状为 $480 \times 320$ 。在下面，我们可以看到训练和测试集中保留的示例数量。

crop_size = (480, 320) voc_train = VOCSegDataset(crop_size, voc_dir, is_train=True) voc_test = VOCSegDataset(crop_size, voc_dir, is_train=False)

6. 数据迭代器

最后，我们定义一个函数load_data_voc来下载和加载此数据集，然后返回数据迭代器。

def load_data_voc(batch_size, crop_size): voc_dir = d2l.download_extract('voc2012', os.path.join('VOCdevkit', 'VOC2012')) num_workers = d2l.get_dataloader_workers() train_iter = gluon.data.DataLoader( VOCSegDataset(crop_size, voc_dir, is_train=True), batch_size, shuffle=True, last_batch='discard', num_workers=num_workers) test_iter = gluon.data.DataLoader( VOCSegDataset(crop_size, voc_dir, is_train=False), batch_size, last_batch='discard', num_workers=num_workers) return train_iter, test_iter train_iter, test_iter = load_data_voc(64, crop_size)

我们将批处理大小设置为64，并为训练和测试集定义迭代器。打印第一个小批量的形状。与图像分类和对象识别相反，这里的标签是三维数组。

for X, Y in train_iter: print(X.shape) print(Y.shape) break

7.参考

https://d2l.ai/chapter_computer-vision/semantic-segmentation-and-dataset.html

Processed: 0.864, SQL: 8