使用tensorflow对象检测模型python和opencv的社交距离检测器

    科技2022-07-12  137

    介绍 (Introduction)

    During the quarantine I was spending time on github exploring Tensorflow’s huge number of pre-trained models. While doing this, I stumbled on a repository containing 25 pre-trained object detection models with performance and speed indicators. Having some knowledge in computer vision and given the actual context, I thought it could be interesting to use one of these to build a social distancing application.

    在隔离期间,我花时间在github上探索Tensorflow的大量预训练模型。 这样做时,我偶然发现了一个包含25 个带有性能和速度指标的预训练对象检测模型的存储库 。 拥有一些计算机视觉知识并给出了实际的环境,我认为使用其中之一来构建社交化距离应用程序可能会很有趣。

    More so, last semester I was introduced to OpenCV during my Computer Vision class and realized how powerful it was while doing a number of small projects. One of these included performing a bird eye view transformation of a picture. A bird eye view is a basically a top-down representation of a scene. It is a task often performed when building applications for automatic car driving.

    更重要的是,上学期在计算机视觉课程中向我介绍了OpenCV ,并意识到在执行许多小型项目时它的功能多么强大。 其中之一包括对图片进行鸟瞰转换。 鸟瞰图基本上是场景的自顶向下表示。 这是在构建自动汽车驾驶应用程序时经常执行的任务。

    Implementation of bird’s eye view system for camera on vehicle 车载摄像头鸟瞰系统的实现

    This made me realize that applying such technique on a scene where we want to monitor social distancing could improve the quality of it. This article presents how I used a deep learning model along with some knowledge in computer vision to build a robust social distancing detector.

    这使我意识到,将这种技术应用到我们想要监视社会距离的场景中可以提高其质量。 本文介绍了我如何使用深度学习模型以及计算机视觉方面的一些知识来构建强大的社交距离探测器。

    This article is going to be structured as follow :

    本文的结构如下:

    Model selection

    选型 People detection

    人物检测 Bird eye view transformation

    鸟瞰图转换 Social distancing measurement

    社会距离测量 Results and improvements

    结果与改进

    All of the following code along with installation explanations can be found on my github repository.

    以下所有代码以及安装说明都可以在我的github存储库中找到。

    1.选型 (1. Model selection)

    All the models available on the Tensorflow object detection model zoo have been trained on the COCO dataset (Common Objects in COntext). This dataset contains 120,000 images with a total 880,000 labeled objects in these images. These models are trained to detect the 90 different types of objects labeled in this dataset. A complete list of all this different objects is available in the data part of the repository accessible on the data section of the github repo. This list of objects includes a car, a toothbrush, a banana and of course a person.

    Tensorflow对象检测模型Zoo中可用的所有模型均已在COCO数据集 (COntext中的通用对象)上进行了训练。 该数据集包含120,000张图像,这些图像中总共有880,000个带标签的对象。 这些模型经过训练可以检测此数据集中标记的90种不同类型的对象 。 所有这些不同对象的完整列表可在github repo的data部分访问的存储库的data部分中找到。 这些对象包括汽车,牙刷,香蕉和人。

    Non exhaustive list of the available models 可用型号的非详尽清单

    They have different performances depending on the speed of the model. I made a few tests in order to determine how to leverage the quality of the model depending on the speed of the predictions. Since the goal of this application was not to be able to perform real time analysis, I ended up choosing the faster_rcnn_inception_v2_coco which has a mAP (detector performance on a validation set) of 28, which is quite strong, and an execution speed of 58 ms.

    根据模型的速度,它们具有不同的性能。 我进行了一些测试,以确定如何根据预测速度来利用模型的质量。 由于此应用程序的目标不是能够执行实时分析,因此我最终选择了fast_rcnn_inception_v2_coco 它的mAP(验证集上的检测器性能)为28,非常强大,执行速度为58毫秒。

    2.人员检测 (2. People detection)

    To use such model, in order to detect persons, there are a few steps that have to be done:

    要使用这种模型,为了检测人员,必须完成一些步骤:

    Load the file containing the model into a tensorflow graph. and define the outputs you want to get from the model.

    将包含模型的文件加载到张量流图中。 并定义您要从模型获得的输出。 For each frame, pass the image through the graph in order to get the desired outputs.

    对于每一帧,使图像通过图形以获取所需的输出。 Filter out the weak predictions and objects that do not need to be detected.

    过滤掉不需要检测的弱预测和对象。

    加载并启动模型 (Load and start the model)

    The way tensorflow models have been designed to work is by using graphs. The first step implies loading the model into a tensorflow graph. This graph will contain the different operations that will be done in order to get the desired detections. The next step is creating a session which is an entity responsible of executing the operations defined in the previous graph. More explanations on graphs and sessions are available here. I have decided to implement a class to keep all the data related to the tensorflow graph together.

    设计张量流模型的工作方式是使用图形 。 第一步意味着将模型加载到张量流图中。 该图将包含为了获得所需检测而将要执行的不同操作。 下一步是创建一个会话 ,该会话是负责执行上图中定义的操作的实体。 有关图形和会话的更多说明,请参见此处 。 我决定实现一个类,以将与张量流图有关的所有数据保持在一起。

    class Model: """ Class that contains the model and all its functions """ def __init__(self, model_path): """ Initialization function @ model_path : path to the model """ # Declare detection graph self.detection_graph = tf.Graph() # Load the model into the tensorflow graph with self.detection_graph.as_default(): od_graph_def = tf.compat.v1.GraphDef() with tf.io.gfile.GFile(model_path, 'rb') as file: serialized_graph = file.read() od_graph_def.ParseFromString(serialized_graph) tf.import_graph_def(od_graph_def, name='') # Create a session from the detection graph self.sess = tf.compat.v1.Session(graph=self.detection_graph) def predict(self,img): """ Get the predicition results on 1 frame @ img : our img vector """ # Expand dimensions since the model expects images to have shape: [1, None, None, 3] img_exp = np.expand_dims(img, axis=0) # Pass the inputs and outputs to the session to get the results (boxes, scores, classes) = self.sess.run([self.detection_graph.get_tensor_by_name('detection_boxes:0'), self.detection_graph.get_tensor_by_name('detection_scores:0'), self.detection_graph.get_tensor_by_name('detection_classes:0')],feed_dict={self.detection_graph.get_tensor_by_name('image_tensor:0'): img_exp}) return (boxes, scores, classes)

    通过模型传递每一帧 (Pass every frame through the model)

    A new session is started for every frame that needs processing. This is done by calling the run() function. Some parameters have to be specified when doing so. These include the type of input that the model requires and which outputs we want to get back from it. In our case the outputs needed are the following:

    对于需要处理的每个帧,都会启动一个新会话。 这是通过调用run()函数来完成的。 这样做时必须指定一些参数。 其中包括模型所需的输入类型以及我们要从中获取的输出。 在我们的情况下,所需的输出如下:

    Bounding boxes coordinates of each object

    每个对象的边界框坐标

    The confidence of each prediction (0 to 1)

    每个预测的置信度 (0到1)

    Class of the prediction (0 to 90)

    预测等级 (0到90)

    过滤掉弱预测和不相关的对象 (Filter out weak predictions and non-relevant objects)

    Results of the person detection 人物检测结果

    One of the many classes detected by the model is a person. The class associated to a person is 1.

    该模型检测到的许多类别之一是人。 与一个人关联的类别为1。

    In order to exclude both weak predictions (threshold : 0.75) and all other classes of objects except from person, I used an if statement combining both conditions to exclude any other object from further computation.

    为了排除弱预测( 阈值:0.75 )和除人以外的所有其他类别的对象,我使用了if语句,将这两个条件结合起来以排除任何其他对象,以免进一步计算。

    if int(classes[i]) == 1 and scores[i] > 0.75

    But since these models are already pre-trained, it is not possible for them to only detect this class. Therefore these models take quite a long time to run because they try to identify all the 90 different type of objects in the scene.

    但是,由于这些模型已经过预训练,因此不可能仅检测此类。 因此, 这些模型要花很长时间才能运行,因为它们试图识别场景中所有90种不同类型的对象。

    3.鸟瞰图转换 (3. Bird eye view transformation)

    As explained in the introduction, performing a bird eye view transformation gives us a top view of a scene. Thankfully, OpenCV has great built-in functions to apply this method to an image in order to transform an image taken from a perspective point of view to a top view of this image. I used a tutorial from the great Adrian Rosebrock to understand how to do this.

    如引言中所述,执行鸟瞰图转换可为我们提供场景的俯视图 。 值得庆幸的是,OpenCV具有强大的内置功能,可以将此方法应用于图像,以便将所拍摄的图像从透视图角度转换为顶视图。 我使用了伟大的Adrian Rosebrock的教程来了解如何做到这一点。

    The first step involves selecting 4 points on the original image that are going to be the corner points of the plan which is going to be transformed. This points have to form a rectangle with at least 2 opposite sides being parallel. If this is not done, the proportions will not be the same when the transformation happens. I have implemented a script available in my repository which uses the setMouseCallback() function of OpenCV to get these coordinates. The function that computes the transformation matrix also requires the dimension of the image which are computed using the image.shape propriety of an image.

    第一步涉及选择原始图像上的4个点,这些点将成为要转换的计划的拐角点。 这一点必须形成一个矩形,其中至少两个相对的侧面平行。 如果不这样做,则转换发生时的比例将不同。 我已经在我的存储库中实现了一个脚本,该脚本使用OpenCVsetMouseCallback()函数来获取这些坐标。 计算变换矩阵的函数还需要使用图像的image.shape属性来计算图像的尺寸。

    width, height, _ = image.shape

    This returns the width, the height and other non relevant color pixel values. Let’s see the how they are used to compute the transformation matrix :

    这将返回宽度,高度和其他不相关的彩色像素值。 让我们看看它们如何用于计算转换矩阵:

    def compute_perspective_transform(corner_points,width,height,image): """ Compute the transformation matrix @ corner_points : 4 corner points selected from the image @ height, width : size of the image return : transformation matrix and the transformed image """ # Create an array out of the 4 corner points corner_points_array = np.float32(corner_points) # Create an array with the parameters (the dimensions) required to build the matrix img_params = np.float32([[0,0],[width,0],[0,height],[width,height]]) # Compute and return the transformation matrix matrix = cv2.getPerspectiveTransform(corner_points_array,img_params) img_transformed = cv2.warpPerspective(image,matrix,(width,height)) return matrix,img_transformed

    Note that I chose to also return the matrix because it will be used in the next step to compute the new coordinates of each person detected. The result of this are the “GPS” coordinates of each person in the frame. It is far more accurate to use these than use the original ground points, because in a perspective view, the distance are not the same when people are in different plans, not at the same distance from the camera. Compared to using the points in the original frame, this could improve the social distancing measurement a lot.

    请注意,我选择还返回矩阵,因为下一步将使用该矩阵来计算每个检测到的人的新坐标。 其结果是帧中每个人的“ GPS”坐标 。 与使用原始地面点相比,使用这些点要精确得多,因为在透视图中,当人们处于不同平面图中时,距离是不相同的,并且距照相机的距离也不相同。 与使用原始框架中的点相比,这可以大大改善社会距离度量。

    For each person detected, the 2 points that are needed to build a bounding box a returned. The points are the top left corner of the box and the bottom right corner. From these, I computed the centroid of the box by getting the middle point between them. Using this result, I calculated the coordinates of the point located at the bottom center of the box. In my opinion, this point, which I refer to as the ground point, is the best representation of the coordinate of a person in an image.

    对于检测到的每个人,将返回构建边界框所需的2个点。 这些点是框的左上角和右下角。 从这些中,我通过获取它们之间的中间点来计算盒子的质心 。 使用此结果,我计算了位于框底部中心的点的坐标。 在我看来,这一点(我称为基准点 )是图像中人的坐标的最佳表示。

    Then I used the transformation matrix to compute the transformed coordinates for each ground point detected. This is done on each frame, using the cv2.perspectiveTransform(), after having detected the person in it. This is how I implemented this task :

    然后,我使用变换矩阵为每个检测到的地面点计算变换后的坐标。 在检测到人之后,使用cv2.perspectiveTransform()在每一帧上完成此操作。 这就是我实现此任务的方式:

    def compute_point_perspective_transformation(matrix,list_downoids): """ Apply the perspective transformation to every ground point which have been detected on the main frame. @ matrix : the 3x3 matrix @ list_downoids : list that contains the points to transform return : list containing all the new points """ # Compute the new coordinates of our points list_points_to_detect = np.float32(list_downoids).reshape(-1, 1, 2) transformed_points = cv2.perspectiveTransform(list_points_to_detect, matrix) # Loop over the points and add them to the list that will be returned transformed_points_list = list() for i in range(0,transformed_points.shape[0]): transformed_points_list.append([transformed_points[i][0][0],transformed_points[i][0][1]]) return transformed_points_list

    4.衡量社会距离 (4. Measuring social distancing)

    After calling this function on each frame, a list containing all the new transformed points is returned. From this list I had to compute the distance between each pair of points. I used the function combinations() from the itertools library which allows to get every possible combination in a list without keeping doubles. This is very well explained on this stack overflow issue. The rest is simple math : the distance between two points is easy to do in python using the math.sqrt() function. The threshold chosen was 120 pixels, because it is approximatively equal to 2 feet in our scene.

    在每帧上调用此函数后,将返回一个包含所有新转换点的列表。 从这个列表中,我必须计算每对点之间的距离。 我使用了来自itertools库的function Combines ()函数,该函数允许在列表中获取所有可能的组合而无需保留双精度。 在此堆栈溢出问题上对此进行了很好的解释。 其余的是简单的数学运算:使用math.sqrt()函数在python中很容易做到两点之间的距离。 选择的阈值为120像素,因为它在我们的场景中大约等于2英尺。

    # Check if 2 or more people have been detected (otherwise no need to detect) if len(transformed_downoids) >= 2: # Iterate over every possible 2 by 2 between the points combinations list_indexes = list(itertools.combinations(range(len(transformed_downoids)), 2)) for i,pair in enumerate(itertools.combinations(transformed_downoids, r=2)): # Check if the distance between each combination of points is less than the minimum distance chosen if math.sqrt( (pair[0][0] - pair[1][0])**2 + (pair[0][1] - pair[1][1])**2 ) < int(distance_minimum): # Change the colors of the points that are too close from each other to red change_color_topview(pair) # Get the equivalent indexes of these points in the original frame and change the color to red index_pt1 = list_indexes[i][0] index_pt2 = list_indexes[i][1] change_color_originalframe(index_pt1,index_pt2)

    Once 2 points are identified being too close from one another, the color of the circle marking the point is changed from green to red and same for the bounding box on the original frame.

    一旦确定两个点之间的距离太近,标记该点的圆圈的颜色将从绿色更改为红色,并且与原始框架上的边界框的颜色相同。

    5.结果 (5. Results)

    Let me resume how this project works :

    让我恢复一下该项目的工作方式:

    First get the 4 corner points of the plan and apply the perspective transformation to get a bird view of this plan and save the transformation matrix.

    首先获取计划的4个角点,然后应用透视变换获得该计划的鸟瞰图并保存变换矩阵。 Get the bounding box for each person detected in the original frame.

    获取在原始框架中检测到的每个人的边界框。 Compute the lowest point of this box. It is the point located between both feet.

    计算此框的最低点。 这是位于双脚之间的点。 Use the transformation matrix to each of theses points to get the real “GPS” coordinates of each person.

    使用转换矩阵对这些点中的每一个获取每个人的真实“ GPS”坐标。

    Use itertools.combinations() to measure the distance from every points to all the other ones in the frame.

    使用itertools.combinations()测量框架中每个点到所有其他点的距离。

    If a social distancing violation is detected, change the color of the bounding box to red.

    如果检测到违反社交距离的行为,请将边界框的颜色更改为红色。

    I used a video from the PETS2009 dataset which consists of multisensor sequences containing different crowd activities. It was originally build for tasks like person counting and density estimation in crowds. I decided to use video from the the 1st angle because it was the widest one, with the best view of the scene. This video presents the results obtained :

    我使用了来自PETS2009数据集的视频,该视频由包含不同人群活动的多传感器序列组成。 它最初是为诸如人群中的人员计数和密度估计之类的任务而构建的。 我决定从第一个角度使用视频,因为它是最宽的一个,具有最佳的场景视角。 该视频介绍了获得的结果:

    演示地址

    6.结论与改进 (6. Conclusion and improvements)

    Nowadays, social distancing along with other basic sanitary mesures are very important to keep the spread of the Covid-19 as slow as possible. But this project is only a proof of concept and was not made to be use to monitor social distancing in public or private areas because of ethical and privacy issues.

    如今,与社会隔离以及其他基本卫生措施对保持Covid-19的传播速度尽可能慢非常重要。 但是,由于道德和隐私问题,该项目仅是概念的证明,不能用于监视公共或私人区域的社会距离。

    I am well aware that this project is not perfect so these are a few ideas how this application be improved :

    我很清楚这个项目不是完美的,因此有一些如何改进此应用程序的想法:

    Using a faster model in order to perform real-time social distancing analysis.

    使用更快的模型来执行实时社交距离分析。 Use a model more robust to occlusions.

    使用对遮挡更鲁棒的模型。 Automatic calibration is a very well known problem in Computer vision and could improve a lot the bird eye view transformation on different scenes.

    自动校准是计算机视觉中一个非常众所周知的问题,可以大大改善不同场景下的鸟瞰图转换。

    This article is my first contribution to Towards Data Science and Medium. I have made the code available on my Github. Please do not hesitate to ask if you have a question regarding the code itself or this article. If you have ideas for possible improvements or any kind of feedback, feel free to contact me, it will be greatly appreciated. I hope you find this helpful and feel free to share it if you like it.

    本文是我对“迈向数据科学和介质”的第一篇文章。 我已经在Github上提供了代码。 请不要问您对代码本身或本文有疑问。 如果您有任何可能的改进想法或任何反馈,请随时与我联系,我们将不胜感激。 希望对您有所帮助,并随时分享。

    资料来源 (Sources)

    While implementing this project, I found a lot of online articles that help me get through the difficulties I had :

    在实施该项目时,我发现了许多在线文章,可以帮助我度过所遇到的困难:

    翻译自: https://towardsdatascience.com/a-social-distancing-detector-using-a-tensorflow-object-detection-model-python-and-opencv-4450a431238

    相关资源:OpenCV4.x图像处理实例-社交距离检测Caffe模型
    Processed: 0.010, SQL: 8