目录
1. baseline2. 从多个方面改进2.1 fusing context information from different layers2.2 new loss function2.3 anchor refinement and matching2.4 architecture redesign2.5 feature enrichment and alignment2.6 label assignment2.7 address the extreme class imbalance problem2.8 training from scratch
One-stage Anchor-based Object Detectors主要特点是检测速度快,所以主要的改进思路是如何在维持检测速度快的情况下,提高目标检测的准确度。
1. baseline
YOLOv1将图像划分为多个网格,每个网格预测多个边界框,但是YOLOv1存在定位精度低和召回率低等问题。YOLOv2针对YOLOv1存在的问题进行了多处改进,比较重要的就是引入anchor,使得该一阶段目标检测模型称为anchor-baed分支,其他改进主要包括BN算法、提高分类器的输入图像分辨率、使用卷积层代替全连接层、k-means聚类出anchor的形状、直接预测边界框中心点的位置、高低分辨率特征图融合、多尺度训练、backbone的改进等。SSD同样引入了anchor,但其最大的贡献是进行多尺度预测,大大提高了检测准确度RetinaNet则提出Focal loss来解决正负样本不平衡的问题,并借助FPN结构,使得one-stage anchor-based目标检测模型的准确度大大提高。Focal loss也慢慢成为anchor-free模型的标配。YOLOv3和YOLOv4则进一步对YOLO系列进行改进
2. 从多个方面改进
2.1 fusing context information from different layers
RON: reverse connection with objectness prior networks for object detection. In CVPR, 2017【RON同时进行了多尺度特征融合和多尺度特征预测,极大地提高了目标检测的准确度】DSSD : Deconvolutional single shot detector. [CoRR, 2017]【DSSD introduces additional context into SSD via deconvolution to improve the accuracy】Scale-transferrable object detection. In CVPR, 2018 【STDN提出scale-transfer module来生成多尺度特征图】M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network. In 2019 【M2Det提出MLFPN来提取多尺寸多层级的特征图】
2.2 new loss function
UnitBox: An Advanced Object Detection Network. In ACM, 2016 【IoU loss】Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. In CVPR, 2019 【GIoU loss】Distance-IoU Loss: Faster and better learning for bounding box regression. In AAAI, 2020 【DIoU loss and CIoU loss】Towards accurate one-stage object detection with ap-loss. In CVPR, 2019
2.3 anchor refinement and matching
Single-shot refinement neural network for object detection. In CVPR, 2018 【RefineDet 在分类和回归之前先进行粗分类和粗回归,过滤掉大量的negative anchor,减小需要head进行分类的样本数量,并对positive anchor的位置和形状进行调整,使后续head的回归器更易实现更高精度的定位】Freeanchor: Learning to match anchors for visual object detection. In NIPS, 2019 【FreeAnchor引入anchor bag的概念,并设计新的损失函数,让网络自动学习label assignment】
2.4 architecture redesign
Deep feature pyramid reconfiguration for object detection. In ECCV, 2018
2.5 feature enrichment and alignment
Receptive field block net for accurate and fast object detection. In ECCV, 2018Single-shot object detection with enriched semantics. In CVPR, 2018Learning rich features at high-speed for single-shot object detection. In ICCV, 2019Enriched feature guided refinement network for object detection. In ICCV, 2019Dynamic anchor feature selection for single-shot object detection. In ICCV, 2019
2.6 label assignment
Learning from Noisy Anchors for One-stage Object Detection. In CVPR, 2020
2.7 address the extreme class imbalance problem
RON: reverse connection with objectness prior networks for object detection. In CVPR, 2017 【RON为每个输出特征图生成一个相同尺寸的objectness maps,objectness maps的通道数等于每个位置中anchor的数量,objectness表示该anchor包含物体的置信度大小,通过引入objectness prior,在反向传播时过滤了大量的背景类,缓解了正负样本不平衡问题】
2.8 training from scratch
DSOD: learning deeply supervised object detectors from scratch. In ICCV, 2017【 designs an efficient framework and a set of principles to learn object detectors from scratch, following the network structure of SSD】Scratchdet: Exploring to train single-shot object detectors from scratch. In CoRR, 2018