图像读取库合集——cv2, PIL, skimage与numpy, pytorch(ToPILimage)

科技2022-08-08 107

图像读取库合集——cv2, PIL, skimage与numpy, pytorch(ToPILimage)

1 图像读取与属性

1.1 PIL与numpy间的相互访问

import numpy as np from PIL import Image #read a image with 3 channels, 500x889 pixels img_pil = Image.open('./test.png') #show a image img_pil.show() #get image imfo print(img_pil) #get the pixel value in PIL format print(img_pil.getpixel((0,0))) #covert PIL to numpy img_np = np.array(img_pil) print(img_np.shape) #get the pixel value in numpy format print(img_np[0,0]) #convert numpy to PIL img_pil = Image.fromarray(img_np) print(img_pil) """ <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=500x889 at 0x193331AD240> (219, 210, 193) (889, 500, 3) [219 210 193] <PIL.Image.Image image mode=RGB size=500x889 at 0x1933330ADA0> """

$n o t e$ ：

PIL库读取图像的三通道顺序为RGB，读取图像的宽度(

w i d t h

)和高度(

h e i g h t

)与原始图像一致;PIL库与

n u m p y

的转化存在细微的差别：

n u m p y . a r r a y ()

会改变PIL对象的宽度和高度信息，

I m a g e . f r o m a r r a y ()

会重新调整回原始状态；PIL访问某一位置的像素值时调用

img\_pil.getpixel((x,y))

n u m p y

为矩阵形式，直接访问

i n d e x

img\_np[x,y]

；

1.2 cv2与numpy间的相互访问

import numpy as np import cv2 #read a image with 3 channels, 500x889 pixels img_cv = cv2.imread('./test.png') #show a image cv2.imshow('img', img_cv) #get image imfo print(img_cv.shape) #get the pixel value in cv2 format print(img_cv[0,0]) #covert cv2 to numpy img_np = np.array(img_cv) print(img_np.shape) #get the pixel value in numpy format print(img_np[0,0]) #convert numpy to cv2(not necessary) cv2.imshow('img_np', img_np) cv2.waitKey(0) """ (889, 500, 3) [193 210 219] (889, 500, 3) [193 210 219] """

$n o t e$ :

cv2读取图像的三通道顺序为GBR, 图像的宽度信息和高度信息发生调整；cv2访问元素和

n u m p y

的方式相同，通过

i n d e x

直接访问；cv2可以直接打开

n u m p y

数组(

u i n t 8

);为避免cv2闪退，通常加上

c v 2 . w a i t K e y ()

等待键入才退出;

1.3 skimg与numpy间的相互访问

import numpy as np from skimage import io, transform import matplotlib.pyplot as plt #read a image with 3 channels, 500x889 pixels img_sk = io.imread('./test.png') #get image info print(img_sk.shape) io.imshow(img_sk) #get the pixel value in skimage format print(img_sk[0,0]) #covert skimage to numpy img_np = np.array(img_sk) print(img_np.shape) #get the pixel value in numpy format print(img_np[0,0]) #convert numpy to skimg io.imshow(img_np) plt.show() """ (889, 500, 3) [219 210 193] (889, 500, 3) [219 210 193] """

$n o t e$ :

s k i m a g e

库和

c v 2

比较相似，可以看到结果输出也基本相同，和

n u m p y

的转化也比较方便；

s k i m a g e

库无法直接打开图像，需要借助

m a t p l o t l i b . p y p l o t

，因此

s k i m a g e

通常和

p y p l o t

合并使用用于过程可视化，可以方便画图、画表格；

综上而言，PIL库尽可能保持了原始输入的信息，使用方便快捷，此外，PIL库通常还可以与imageio库相互结合做图像预处理；c $v 2$ 将图像转化为数组便于对图像的进一步处理； $s k i m a g e$ 和 $m a t p l l t l i b$ 相互结合，做图像对比更加方便；

2 Pytorch读取图像

torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False, num_workers=8, drop_last=False)

调用 $P y t o r c h$ 的 $D a t a L o a d e r$ 时需要加载 $d a t a s e t$ ，此处的 $d a t a s e t$ 为自定义的数据，用于输出图像和对应的标签信息，同时对图像做数据增强，此时的数据类型为PIL对象，此处以Standford_car为例（代码来源：sourcecode）：

class STANFORD_CAR(): def __init__(self, input_size, root, is_train=True, data_len=None): self.input_size = input_size self.root = root self.is_train = is_train train_img_path = os.path.join(self.root, 'cars_train') test_img_path = os.path.join(self.root, 'cars_test') train_label_file = open(os.path.join(self.root, 'train.txt')) test_label_file = open(os.path.join(self.root, 'test.txt')) train_img_label = [] test_img_label = [] for line in train_label_file: train_img_label.append([os.path.join(train_img_path, line[:-1].split(' ')[0]), int(line[:-1].split(' ')[1])-1]) for line in test_label_file: test_img_label.append([os.path.join(test_img_path, line[:-1].split(' ')[0]), int(line[:-1].split(' ')[1])-1]) self.train_img_label = train_img_label[:data_len] self.test_img_label = test_img_label[:data_len] def __getitem__(self, index): if self.is_train: img, target = imageio.imread(self.train_img_label[index][0]), self.train_img_label[index][1] if len(img.shape) == 2: img = np.stack([img] * 3, 2) img = Image.fromarray(img, mode='RGB') img = transforms.Resize((self.input_size, self.input_size), Image.BILINEAR)(img) # img = transforms.RandomResizedCrop(size=self.input_size, #scale=(0.4, 0.75),ratio=(0.5,1.5))(img)# # img = transforms.RandomCrop(self.input_size)(img) img = transforms.RandomHorizontalFlip()(img) img = transforms.ColorJitter(brightness=0.2, contrast=0.2)(img) img = transforms.ToTensor()(img) img = transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])(img) else: img, target = imageio.imread(self.test_img_label[index][0]), self.test_img_label[index][1] if len(img.shape) == 2: img = np.stack([img] * 3, 2) img = Image.fromarray(img, mode='RGB') img = transforms.Resize((self.input_size, self.input_size), Image.BILINEAR)(img) # img = transforms.CenterCrop(self.input_size)(img) img = transforms.ToTensor()(img) img = transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])(img) return img, target def __len__(self): if self.is_train: return len(self.train_img_label) else: return len(self.test_img_label)

此段代码同时使用了PIL库， $n u m p y$ 库，以及相应的 $i m a g e i o$ 库进行相应的图像增强。

Processed: 0.009, SQL: 8