解码技术水平的目标检测

    科技2025-02-17  11

    EfficientDet is a neural network architecture which achieves State-Of-The-Art (SOTA) results (~55.1 Average Precision) on object detection (Microsoft COCO dataset) task with much lesser (~4x-9x) complexity than the previous detectors [2]. The key building blocks of EfficientDet are

    ËfficientDet是神经网络架构,实现了小号 tate-ØF- 牛逼他- 一个 RT上的物体检测(SOTA)的结果(55.1〜平均精度 )( 微软COCO数据集)用少得多的任务(〜4X-9X)的复杂性比先前的检测器[ 2 ]。 EfficientDet的关键构建块是

    Compound Scaling

    复合缩放

    Bi-directional Feature Pyramid Network (BiFPN)

    双向特征金字塔网络(BiFPN)

    复合缩放 (Compound Scaling)

    EfficientNet Paper EfficientNet Paper

    Scaling up ConvNets in width (b), depth(c), or resolution (d) is widely used to achieve better accuracy, but require tedious manual tuning when scaling is done in 2 or 3 dimensions and still led to sub-optimal accuracy and efficiency. Compound scaling (e) uses a single compound coefficient ϕ to scale network width, depth, and resolution in a principled way according to the below equations.

    小号 caling了在宽度ConvNets(b)中,深度(c)中,或分辨率(d)被广泛用于实现更好的准确度,但需要时缩放在2维或3维完成并仍然导致次优的精度繁琐的手动调谐和效率。 复合缩放(e)使用单个复合系数ϕ根据以下等式以原则方式缩放网络的宽度,深度和分辨率。

    EfficientNet paper. EfficientNet论文的图像。

    The compound scaling is formulated as an optimization problem given the baseline network and target flops find out the scaling coefficient for scaling the base network as shown in the above equations and is done in 2 steps:

    假设基线网络和目标触发器找出了用于缩放基础网络的缩放系数,则复合缩放被公式化为一个优化问题,如上述等式所示,分两个步骤完成:

    “1. Fix ϕ = 1, assume 2x more resources available, do a grid search of ⍺, β, 𝛾 based on the above equations.

    “ 1。 修复ϕ = 1,假设有2倍的可用资源,则根据上述等式对⍺,β,𝛾进行网格搜索。

    2. Fix ⍺, β, 𝛾 as constants and scale-up baseline network with different ϕ ” [3]

    2.固定⍺,β,𝛾为常数,并用不同的ϕ扩大基准网络 [ 3 ]

    The parameter of above left equation ℱᵢ (Input Layer), Lᵢ (Number of layers), Hᵢ (Height), Wᵢ (Width), Cᵢ (Number of channels) are predefined parameters of the baseline network as shown in the below Efficient network details table, and w, d, r are coefficient for scaling network width, depth, and resolution. The parameters of the above right equation ϕ control resources for model scaling, while ⍺, β, 𝛾 controls depth, width, or resolution scaling respectively. Since compound scaling does not change layer operators in the baseline network, having a strong baseline network is also critical. “The baseline network is selected by leveraging a multi-objective neural architecture search that optimizes accuracy and FLOPs”[3].

    左上等式ℱᵢ(输入层),Lᵢ(层数),Hᵢ(高度),Wᵢ(宽度),Cᵢ(通道数)的参数是基准网络的预定义参数,如下所示。有效网络详细信息表,w,d,r是缩放网络宽度,深度和分辨率的系数。 右上方程ϕ的参数控制模型缩放的资源,而⍺,β,𝛾分别控制深度,宽度或分辨率缩放。 由于复合缩放不会改变基线网络中的层运算符,因此拥有强大的基线网络也很关键。 “ 通过利用优化精度和FLOP的多目标神经架构搜索来选择基线网络” [ 3 ] 。

    双向特征金字塔网络(BiFPN) (Bi-directional Feature Pyramid Network (BiFPN))

    EfficientDet paper EfficientDet纸

    Multi-scale feature fusion aggregates feature at different resolutions and widely used practice to learn scale-invariant features. Feature Pyramid Network [FPN] (a) fuse features top-down from level 3 to 7 and is limited by one direction information flow, Path Aggregation Network [PANet] (b) adds bottom-up pathway on top of FPN, NAS-FPN (c) uses neural architecture search to find cross-scale feature network topology and required 1000+ hours of search. Also found network using NAS-FPN is irregular and difficult to interpret. [2]

    多尺度特征融合聚合具有不同分辨率的特征,并广泛用于学习尺度不变特征。 特征金字塔网络[ FPN ](a)保险丝的特征是从3层到7层自上而下,并受一个方向信息流的限制,路径聚合网络[ PANet ](b)在FPN,NAS-FPN的顶部增加了自下而上的路径(c)使用神经架构搜索来查找跨尺度特征网络拓扑,并且需要进行1000多个小时的搜索。 还发现使用NAS-FPN的网络是不规则的,难以解释。 [ 2 ]

    BiFPN (d) integrates bi-directional cross-scale connections with weighted feature fusion. The below equation and block diagram describe the fused feature for level 6.

    BiFPN(d)将双向跨尺度连接与加权特征融合集成在一起。 下面的等式和框图描述了级别6的融合功能。

    EfficientDet paper EfficientDet论文的公式 BiFPN Feature Fusion for Level-6 适用于6级的BiFPN功能融合

    P7_in, P6_in (x, y, w) are the input features at level 6/7 respectively, P5_out is the out features at level-5, P6_out (x, y, w) is the out fused features at level-6. P7_in features are upsampled using nearest-neighbor interpolation and P5_out features are downsampled using max-pooling for the features fusion at level 6. The WeightedAddition layer implements fast normalized fusion, positive weights are ensured by applying Relu on each weight in the WeightedAddition layer.

    P7_in,P6_in(x,y,w)分别是6/7级的输入特征,P5_out是5级的输出特征,P6_out(x,y,w)是6级的融合特征。 P7_in特征使用最近邻插值上采样,P5_out特征使用最大池下采样以用于级别6的特征融合。WeightedAddition层实现快速归一化融合 ,通过对WeightedAddition层中的每个权重应用Relu来确保正权重。

    EfficientDet架构 (EfficientDet Architecture)

    EfficientDet architecture uses 4 major networks, EfficientNet backbone, BiFPN, Bounding box prediction, and class prediction network.

    EfficientDet体系结构使用4个主要网络,即EfficientNet主干网,BiFPN,边界框预测和类预测网络。

    EfficientDet paper EfficientDet论文

    Input Image resolution: Since BiFPN uses features from level 3–7, input resolution must be divisible by pow(2, 7) = 128, and the input image is scaled according to the below equation

    输入图像分辨率:由于BiFPN使用3-7级的功能,因此输入分辨率必须可被pow(2,7)= 128整除,并且输入图像将根据以下公式缩放

    Input Scaling Equation 输入比例方程式

    BiFPN network: Since depth needs to be rounded to the smallest integers, depth is scaled linearly. For width (#channels) exponentially grow BiFPN, for width grid search is performed and the optimal value of 1.35 was chosen

    BiFPN网络:由于深度需要四舍五入为最小整数,因此深度会线性缩放。 对于宽度(#channels)呈指数增长的BiFPN,执行宽度网格搜索并选择最佳值1.35

    Backbone network: EfficientDet uses the EfficientNet network without any change, to reuse pre-trained weights of the imagenet model. The below table shows the efficientNet B0 and B7 network details.

    骨干网: EfficientDet使用EfficientNet网络而不进行任何更改,以重用imagenet模型的预训练权重。 下表显示了efficiencyNet B0和B7网络的详细信息。

    Efficient Network Details 高效的网络详细信息 MBConv Block Diagram MBConv框图

    EfficientNet uses mobile inverted bottleneck MBConv as a building block and relies on the idea of depthwise separable convolution and residual connection which enable faster training and better accuracy. These MBConv blocks as shown in the above image, are scaled according to the compound scaling coefficient ϕ (num_repeat) in the backbone network.

    EfficientNet使用移动反向瓶颈MBConv作为构建块,并依靠深度可分离卷积和残差连接的思想,从而可以更快地进行训练并提高准确性。 如上图所示,这些MBConv块根据骨干网中的复合缩放系数ϕ( num_repeat )进行缩放。

    Box/class prediction network: The width of the box and class prediction network is scaled according to BiFPN network width and depth linearly scales according to the below equation.

    框/类预测网络:框和类预测网络的宽度根据BiFPN网络的宽度进行缩放,深度根据以下等式线性缩放。

    EfficientDet paper EfficientDet论文的公式

    The below table shows the scaling config of EfficientDet detectors according to the compound coefficient ϕ.

    下表显示了根据复合系数ϕ的EfficientDet检测器的缩放比例配置。

    EfficientDet Network, Image from EfficientDet网络的缩放配置,来自 EfficientDet Paper EfficientDet论文的图像

    使用预训练的COCO模型进行推断: (Inferencing with pre-trained COCO models :)

    #Make sure python3 pip3 is installed and updatedsudo apt-get updatesudo apt install python3-pip sudo -H pip3 install -U pip (to upgrade the pip to latest version)#Clone EfficientDet git repogit clone https://github.com/google/automl.gitcd ~/automl/efficientdet#Install all the EfficientDet requirementspip3 install -r requirements.txt#Download the pretrained weights. Bold d0 represent the model version and can be in the range d0-d7.wget https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco/efficientdet-d0.tar.gztar zxf efficientdet-d0.tar.gzmkdir -p ./savedmodels/efficient-d0# Export saved model.python3.6 model_inspect.py --runmode=saved_model --model_name=efficientdet-d0 --ckpt_path=./efficientdet-d0 --hparams="image_size=1920x1280" --saved_model_dir=./savedmodels/efficientdet-d0#Make output dir and do inferencing with the saved modelmkdir -p outdirpython3.6 model_inspect.py --runmode=saved_model_infer --model_name=efficientdet-d0 --saved_model_dir=./savedmodels/efficientdet-d0 --input_image=path_to_input_image --output_image_dir=./output --min_score_thresh=0.6 Input Image (Left), EfficientDet-D0 Output (Right) 输入图像(左),EfficientDet-D0输出(右) https://cloud.google.com/vision https://cloud.google.com/vision https://azure.microsoft.com/en-in/services/cognitive-services/computer-vision/#features https://azure.microsoft.com/en-in/services/cognitive-services/computer-vision/#features Input Image (Left), EfficientDet-D0 Output (Right) 输入图像(左),EfficientDet-D0输出(右) https://cloud.google.com/vision https://cloud.google.com/vision https://azure.microsoft.com/en-in/services/cognitive-services/computer-vision/#features https://azure.microsoft.com/en-in/services/cognitive-services/computer-vision/#features

    EfficientDet output looks to be better on the tested images with respect to the cloud vision APIs. A detailed comparative analysis is required to conclude EfficientDet quality with respect to production cloud vision APIs. EfficientDet with novel BiFPN and compound scaling will definitely serve as a new foundation of future object detection related research and will make object detection models practically useful for many more real-world applications.

    ËfficientDet输出看起来是在测试的图像更好的关于云计算愿景的API。 需要进行详细的比较分析才能得出有关生产云视觉API的EfficientDet质量。 具有新型BiFPN和复合缩放功能的EfficientDet无疑将成为未来与对象检测相关研究的新基础,并将使对象检测模型对更多实际应用具有实际意义。

    Thanks for reading the article, I hope you found this to be helpful. If you did, please share it on your favorite social media so other folks can find it, too. Also, let us know in the comment section if something is not clear or incorrect.

    感谢您阅读本文,希望对您有所帮助。 如果您这样做了,请在您喜欢的社交媒体上分享它,以便其他人也可以找到它。 另外,如果有不清楚或不正确的地方,请在评论部分告诉我们。

    翻译自: https://towardsdatascience.com/decoding-state-of-the-art-object-detection-99f79d97b75d

    Processed: 0.011, SQL: 8