pix2pix gan

科技2025-03-23 52

pix2pix gan

There are times that we want to transform an image into another style. Let’s say we have a fine collection of sketches. Our daily work is to colour these black and white images.

有时我们想将图像转换为另一种样式。假设我们有一组草图。我们的日常工作是为这些黑白图像着色。

It might be interesting if the number of tasks is small, but when it comes to hundreds of sketches a day, hmmm… maybe we need some help. This is where GAN comes to rescue. Generative Adversarial Network, or GAN, is a machine learning framework that aims to generate new data with the same distribution as the one in the training dataset. In this article, we will build a pix2pix GAN that takes an image as input, and later outputs another image.

如果任务数量很少，可能会很有趣，但是当涉及到每天数百个草图时，嗯……也许我们需要一些帮助。这就是GAN救援的地方。生成对抗网络(GAN)是一种机器学习框架，旨在生成与训练数据集中的分布相同的新数据。在本文中，我们将构建一个pix2pix GAN，它将图像作为输入，然后输出另一个图像。

this paper 本文展示的惊人作品

To break things down, we will go through these steps:

为了分解，我们将执行以下步骤：

Prepare our data

准备我们的数据 Build the network

建立网络Train the network

训练网络Test and see the results

测试并查看结果

准备我们的数据(Prepare our data)

In image transformation, we need to have an original image and its expected transformed result. It is recommended to have more than thousands of this kind of before-after-pairs. (Yes, GAN needs a lot of image 😅) In this post, we will use data from this kaggle dataset.

在图像转换中，我们需要原始图像及其预期的转换结果。建议拥有成千上万的此类前后配对。 (是的，GAN需要很多图像😅 )在本文中，我们将使用来自kaggle数据集的数据。

Interesting fact, there are so many dataset about anime on kaggle 😆 有趣的事实，kaggle上有太多关于动漫的数据集

The image pairs can be saved as a merged one like those in our dataset. They can also be separated in two folders, just make sure the order matches later when we process them 😉

图像对可以像我们数据集中的图像对一样保存为合并的图像对。它们也可以分成两个文件夹，只要稍后处理它们时确保顺序匹配😉

Since the image pairs are merged in a single one, we first need to split them into sketch images and colored pictures:

由于图像对合并为一个图像对，因此我们首先需要将其分为草图图像和彩色图片：

Having our splitting function, we can process the training dataset with the following code:

有了拆分功能，我们可以使用以下代码处理训练数据集：

As my machine is not powerful enough, I only kept 1508 images in the training set 🤫 Here is the output:

由于我的机器功能不够强大，我只在训练集中保留了1508张图像🤫这是输出：

Loaded: (1508, 256, 256, 3) (1508, 256, 256, 3) Saved dataset: gan_img_train.npz

There should be a new file named “gan_img_train.npz” in your current directory. We can check our data before moving on 🔍

当前目录中应该有一个名为“ gan_img_train.npz”的新文件。我们可以在继续🔍之前检查我们的数据

Nice! The source is our sketches, and the target is those colored ones. Ready to rock ’n’ roll 👊

真好！源是我们的草图，目标是那些彩色的草图。准备摇滚乐rock

建立网络 (Build the network)

A GAN network is composed of a generator and a discriminator. A generator is like a student, trying to mimic a masterpiece as his homework. The discriminator then serves as a teacher, giving feedbacks like Good ✅ (‘It looks real!’) or Bad ❌ (‘Nah, it’s so fake.’) to the student’s work. The student does his homework again and again, while the teacher tells him whether he is doing well each time. Once the teacher cannot distinguish the work of the student and the actual masterpiece, we then consider the student is able to create images that are good enough (Poor student 😿)

GAN网络由生成器和鉴别器组成。发电机就像一个学生，试图模仿杰作作为他的家庭作业。然后，判别器充当老师，向学生的工作提供诸如“好”(“看起来很真实！”)或“坏”(“不，这真是假”)的反馈。学生一次又一次地做作业，而老师每次都告诉他自己是否做得很好。一旦老师无法区分学生的作品和真正的杰作，我们便认为该学生能够创建足够好的图像(可怜的学生😿)

Q. If discriminator knows so well about how an image is good or bad, why don't it generate the image on its own?A. To be honest, yes it can generate images. However, since the discriminator excels at seeing a big picture, creating them from pixels becomes an arduous work for it. To generate an image using a discriminator, we have to solve the function argmax D(x), which aims to maximize the score of classifying real and generated images. This function turns out to be too complicated to address if we do not define any limitation. Since this limitation will also restrict the model's capacity, people find it easier to replace solving the argmax D(x) function with a separate generator. 👨🏼‍🏫 Here is a lecture about it.

1.定义鉴别符 (1. Define discriminator)

If you are interested in the difference between ReLU and Leaky ReLU, here is a well-explained article 🌈 In short, Leaky ReLU changes the slope of ReLU where x < 0, causing a leak and extending the range of ReLU.

From the code above, it’s not hard to tell that a discriminator is basically a classifier. It takes an input image then categorizes it into “Real” or “Fake”.

从上面的代码中，不难区分出一个鉴别器基本上就是一个分类器。它拍摄输入图像，然后将其分类为“真实”或“伪造”。

2.定义生成器 (2. Define Generator)

As for generator, its structure is more complicated 😣 Its implementation combines a decoder and an encoder (Its a U-Net structure here). The encoder tries to break the input image down into smaller pieces. From these pieces, the decoder later tries to scale it up and generate a new image in the end.

对于生成器，其结构更加复杂😣其实现结合了解码器和编码器(此处为U-Net结构)。编码器尝试将输入图像分解为较小的部分。从这些片段中，解码器随后尝试将其放大并最终生成新图像。

You may also see UpSampling2D instead of Conv2DTranspose in other generators. The key difference between these two functions is that, UpSampling2D is just a simple scaling up of the image by using nearest neighbour or bilinear upsampling. But Conv2DTranspose will not only upsample its input but also learn what is the best upsampling kernel. (from this Stack Overflow answer)

3.定义GAN (3. Define GAN)

Putting discriminator and generator together, now we have our GAN. An interesting note here is that, we have to set discriminator as “not trainable”. Because if it is trainable, the generator will adjust discriminator’s weights and make it easier to fool it 😱 No, we don’t want this 🤪🤪🤪

将鉴别器和生成器放在一起，现在我们有了GAN。有趣的是，我们必须将鉴别器设置为“不可训练”。因为如果它是可训练的，则生成器将调整鉴别器的权重并使其更容易被欺骗😱不，我们不希望这样做🤪🤪🤪

训练网络 (Train the network)

We first need to load and process our training images:

我们首先需要加载和处理我们的训练图像：

To train the discriminator, we need a lot of real and fake images 😬 In the following functions, generate_real_samples will give us some random samples with its expected transformed result. On the other hand, generate_fake_samples utilizes our generator network to create fake images based on its input. We label the expected result as 1, and 0 for those predicted by the generator to show discriminator that these images are fake.

为了训练鉴别器，我们需要大量真实和伪造的图像😬在以下函数中， generate_real_samples将为我们提供一些具有预期转换结果的随机样本。另一方面， generate_fake_samples利用我们的生成器网络根据其输入来创建伪图像。对于生成器预测的结果，我们将预期结果标记为1，对于生成器预测的结果，标记为0，以表明鉴别器这些图像是假的。

Almost there! It’s always helpful to know how our network performs during the training process. So here we write a function that saves the model’s weight into a h5 file. It also creates a plot comparing the real input, output and the generated image:

差不多好了！了解我们的网络在培训过程中的表现总是有帮助的。因此，在这里我们编写了一个将模型的权重保存到h5文件中的函数。它还创建一个比较实际输入，输出和生成的图像的图：

And yay, let’s train it 🏋️‍♀️

是的，让我们训练它🏋️‍♀️

In the end, there will be 10 model files and 10 plots indicating how the training performs. From our last plot, the GAN networks seems to function well 🥰

最后，将有10个模型文件和10个图，指示训练的执行方式。从我们的上一个情节来看，GAN网络似乎运行良好🥰

(row 1, row 2, row 3) = (input, generated, expected) (第1行，第2行，第3行)=(输入，生成，预期)

测试并查看结果 (Test and see the results)

Having a trained model, let’s test it with some images it has never learned before. We’ll also use the ones from this kaggle dataset so the preprocessing functions defined above can be reused. (In this case, the testing images are processed and saved in “gan_img_test.npz”) The function below creates plots to help us compare the results with the expected output.

拥有训练有素的模型，让我们用一些以前从未学过的图像对其进行测试。我们还将使用kaggle数据集中的数据，以便可以重复使用上面定义的预处理功能。 (在这种情况下，测试图像将被处理并保存在“ gan_img_test.npz ”中。)下面的函数创建图表以帮助我们将结果与预期输出进行比较。

Testing gogogo 🤸‍♀️ ⛹️‍♀️

测试gogogo ♀‍♀️ ⛹️‍♀️

haha, some of them look quite messy 🤪 But since I only use 1508 images here, putting more images in the training dataset will for sure generate a more promising result.

哈哈，其中一些看起来很凌乱。但是由于我在这里只使用了1508张图像，因此将更多图像放到训练数据集中肯定会产生更好的结果。

I mostly followed this post to reproduce the implementations above, so feel free to go back the the original work for a more detailed explanation ☘️

我主要是按照这篇文章来重现上面的实现，所以请随时返回原始工作以获取更详细的解释☘️

翻译自: https://medium.com/swlh/build-a-pix2pix-gan-with-python-6db841b302c7

pix2pix gan

相关资源：9.基于cGAN的pix2pix 模型与自动上色技术 python代码实现

Processed: 0.008, SQL: 8