VID-视频目标检测

科技2024-10-21 65

VID-视频目标检测

问题和背景

前瞻视频目标检测，顾名思义，属于目标检测的一个细分领域，虽然仅仅多了视频两个字，而且视频和图片还相似度较高，但是对比目标检测（下文简称OD）的火爆，视频目标检测（下文检测VID）研究的热度相对小很多，不过在2018-2019年，顶会文章逐渐变多，总的算下来，大约就30多篇，因此对于想在这个领域入门并且做一点成果的人，还是非常友好的。

核心问题这个领域的核心问题主要在于，对于视频中的单独一帧来说，可能会遭遇到运动模糊（motion blur），怪异的姿势（rare poses），遮挡（occlusion）等问题，当然这可能是由相机失焦（camera defocus）或者视频本身质量引起的，这是任务本身的问题。另一方面，我认为这个领域起步比较晚，导致开源代码较少，细数之下，能用的仅仅有DFF、FGFA、SELSA、MEGA这几份，其中选择性也很小，前三者算一个派系，基于MXNET，MEGA是CVPR2020刚刚开源的，集成了DFF、FGFA、RDN、MEGA四种方法，基于pytorch的，所以对比OD和目标跟踪几十份开源代码，实在是有些尴尬。最后，由于解决的问题十分单一，且方法的核心思路也比较单一，就是利用序列中的时空信息（spatial-temporal information across frames in a video）来加强单帧的学习，导致使用的方法其实有些内卷。

总的来说好入门，但是难出彩啊！

part 1 论文整理

有开源的 MEGA (2020CVPR): Memory Enhanced Global-Local Aggregation for Video Object Detection paper code

SSVD (2020 IEEE Transactions on Multimedia): Single Shot Video Object Detector paper code

HVRNet (2020ECCV): Mining Inter-Video Proposal Relations for Video Object Detection papercode

SELSA (2019ICCV): Sequence Level Semantics Aggregation for Video Object Detection paper code

STMN (2018ECCV): Video object detection with an aligned spatial-temporal memory paper code

DFF (2017CVPR): Deep Feature Flow for Video Recognition paper code

FGFA (2017ICCV): Flow-guided feature aggregation for video object detection paper code

没开源的 (2020TPAMI) Object Detection in Videos by High Quality Object Linking paper

TCENet (2020AAAI): Temporal Context Enhanced Feature Aggregation for Video Object Detection paper

STFA (2020ICASSP) :SPATIAL-TEMPORAL FEATURE AGGREGATION NETWORK FOR VIDEO OBJECT DETECTION paper

LSTS (2020ECCV): Learning Where to Focus for Efficient Video Object Detection paper

(2020ECCV) Video Object Detection via Object-level Temporal Aggregation paper

(2020ECCV) CenterNet Heatmap Propagation for Real-time Video Object Detection paper

DSFNet (2020ACMMM): Dual Semantic Fusion Network for Video Object Detection paper

LRTR (2019ICCV): Leveraging Long-Range Temporal Relationships Between Proposals for Video Object Detection paper

OGEMN (2019ICCV): Object guided external memory network for video object detection paper

RDN (2019ICCV): Relation distillation networks for video object detection paper

PSLA (2019ICCV): Progressive Sparse Local Attention for Video Object Detection paper

LWDN (2019AAAI): Video Object Detection with Locally-Weighted Deformable Neighbors paper

THP (2018CVPR): Towards high performance video object detection paper

STSN (2018ECCV): Object Detection in Video with Spatiotemporal Sampling Networks paper

ST-Lattice (2018CVPR): Optimizing video object detection via a scale-time lattice. paper

MANet (2018ECCV): Fully Motion-Aware Network for Video Object Detection paper

TCNN (2017arxiv):T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos paper

D&T (2017ICCV): Detect to track and track to detect paper

Seq-nms (2016arxiv): Seq-nms for video object detection paper

Impression Network (2017arxiv): Impression Network for Video Object Detection paper

Processed: 0.014, SQL: 8