google代码

    科技2022-07-12  138

    google代码

    Extending PCL for use with Python: Bindings generation using Pybind11

    扩展PCL以用于Python:使用Pybind11生成绑定

    Project link

    项目链接

    PR link

    公关链接

    一些背景 (Some background)

    The Point Cloud Library (PCL) is a large scale project for 2D/3D image and point cloud processing. It contains algorithms for perception tasks in robotics such as feature estimation, surface reconstruction, segmentation, etc. It can be used to filter data, combine point clouds, extracting keypoints, etc. to recognize objects based on their geometric appearance and create surfaces from point clouds to visualize them. Do explore their Github repo, it is one of the most open open-source repositories out there (new ideas get rapidly accepted).

    点云库 (PCL)是用于2D / 3D图像和点云处理的大型项目。 它包含用于机器人技术中感知任务的算法,例如特征估计,表面重建,分割等。它可用于过滤数据,组合点云,提取关键点等,以根据对象的几何外观识别对象并从点创建表面云以可视化它们。 请探索他们的Github存储库,它是目前最开放的开源存储库之一(新想法Swift被接受)。

    Out of the many exciting projects proposed by PCL, I chose “Binding Interfaces” for my proposal as it was a unique project — it aims to extend the C++ based PCL for development in other languages, primarily Python. To view all the projects accepted by PCL, visit this link.

    在PCL提出的许多激动人心的项目中,我选择“绑定接口”作为我的建议,因为它是一个独特的项目—旨在扩展基于C ++的PCL以便以其他语言(主要是Python)进行开发。 要查看PCL接受的所有项目,请访问此链接 。

    I had a fantastic time working with PCL during GSoC 2020. I got the most amazing mentors one could hope for, Kunal and Andrea, and I am grateful to them in ways I can’t express. I can’t thank them enough for teaching me the importance of following good development practices, code reviews, and oh, testing. They maintained a jovial working environment and were surprisingly quick to respond to my doubts, despite their busy schedules. Pair programming sessions were fun. Also, a special thanks to Sergio, an extremely cool guy whose opinions and suggestions always helped.

    在GSoC 2020期间,我度过了与PCL一起度过的美好时光。我得到了人们所希望的最令人惊奇的导师Kunal和Andrea ,我以无法表达的方式对他们表示感谢。 我不能完全感谢他们教给我遵循良好的开发实践,代码审查以及测试的重要性。 尽管他们的日程安排很忙,但他们保持了愉快的工作环境,并且出人意料地Swift回答了我的疑问。 结对编程会议很有趣。 另外,还要特别感谢塞尔吉奥(Sergio) ,他是一个非常酷的人,他的意见和建议总是有帮助的。

    A quick note: I decided to keep this article short, with no code blocks, no diagrams, otherwise it can get quite extensive. I won’t be covering all the details about my project, rather some basics about how the bindings generation process works. If you are interested in the in-depth workings of my project:

    快速说明:我决定使本文简短,没有代码块,没有图表,否则可能会涉及很多内容。 我不会介绍我的项目的所有细节,而是介绍绑定生成过程如何工作的一些基础知识。 如果您对我的项目的深入运作感兴趣:

    For in-depth code reviews and discussions, see the related issues and PRs on my fork (#6 till #46, for now), or have a look at my development branch’s pull request to PCL.

    有关深入的代码审查和讨论,请参阅我的叉子上的相关问题和PR (现在从#6到#46 ),或者查看我的开发分支对PCL的拉取请求 。

    I did most of the preliminary research while preparing my proposal, if you are interested in that, read my proposal.

    在准备我的提案时,我做了大部分的初步研究,如果您对此感兴趣,请阅读我的提案 。

    If you wanna chat about anything, hit me up on Discord (divmadan#4957).

    如果您想聊天,请在Discord(divmadan#4957)上打我。

    那么,为什么要绑定? (So, why bind?)

    High-performance libraries such as PCL are generally written in system programming languages like C++. While this design choice guarantees performance, it cuts down on the ease of use and flexibility.

    诸如PCL之类的高性能库通常以系统编程语言(如C ++)编写。 尽管这种设计选择可以保证性能,但是却降低了易用性和灵活性。

    Mainly, the motive to create a binding for a library is to enable code reuse (reimplementing a library in multiple languages is cumbersome and often impossible) and further development in the bound language. Also, some algorithms can be challenging to efficiently implement in highly abstracted languages like Python, so it is better to instead create bindings for them.

    主要是,为库创建绑定的动机是实现代码重用 (以多种语言重新实现库很麻烦,而且通常是不可能的)并进一步 发展绑定语言。 同样,某些算法可能难以有效地以高度抽象的语言(如Python)实现,因此最好为它们创建绑定。

    Python和C ++ (Python and C++)

    Python and C++ are in many ways as different as two languages could be: while C++ is usually compiled to machine-code, Python is interpreted. Python’s dynamic type system is the foundation of its flexibility, C++’s static typing is the cornerstone of its efficiency. C++ has an intricate and difficult compile-time meta-language, while in Python, practically everything happens at runtime.

    Python和C ++在许多方面都与两种语言可能有所不同:尽管C ++通常被编译为机器代码,但Python被解释了。 Python的动态类型系统是其灵活性的基础,C ++的静态类型是其效率的基石。 C ++具有复杂而困难的编译时元语言,而在Python中,几乎所有事情都在运行时发生。

    These very differences mean that Python and C++ complement one another perfectly. Performance bottlenecks in Python programs can be rewritten in C++ for maximal speed, and powerful C++ libraries choose Python as a middleware language for its flexible system integration capabilities.

    这些差异意味着Python和C ++可以完美地互补。 可以使用C ++重写Python程序中的性能瓶颈,以实现最大速度,而功能强大的C ++库选择Python作为其灵活的系统集成功能的中间件语言。

    点云库案例 (Point Cloud Library’s case)

    Let’s discuss by taking the case of PCL:

    让我们以PCL为例进行讨论:

    PCL’s codebase is complete C++. To enable its development in Python, we want to access its available functionalities in Python, i.e., we want to make a “wrapper” for its native C++ API in Python. This can be done by creating a binding interface for linking the two languages together. This requires 3 main steps:

    PCL的代码库是完整的C ++。 为了在Python中进行开发,我们想在Python中访问其可用的功能,即,我们想为其在Python中的本机C ++ API创建一个“包装器”。 这可以通过创建用于将两种语言链接在一起的绑定接口来完成。 这需要3个主要步骤:

    Expose the native language (C++) types.

    公开本地语言(C ++)类型。

    Define the rules for linking the 2 languages.

    定义链接两种语言的规则 。

    Generate glue code for the exposed types, i.e., wrap.

    生成暴露类型的粘合代码 ,即包装。

    [1]公开母语类型 ([1] Exposing the native language types)

    Simply put, we need to extract information from our existing codebase. More specifically, we want to know how classes are defined, how functions are implemented, how pointers moved about, how templates are instantiated, etc. That is, we want to get syntactic and semantic information about our existing codebase.

    简而言之,我们需要从现有代码库中提取信息。 更具体地说,我们想知道如何定义类,如何实现函数,如何移动指针,如何实例化模板等。也就是说,我们希望获得有关现有代码库的语法和语义信息。

    Let’s talk in terms of C++. A C++ codebase consists of either source (or, implementation) or header (or, declaration) files. A well-designed codebase will attempt to separate its declarations from the implementation as much as possible, which has implicit benefits for us — more the separation, the easier the bindings generation process. Why? Because we just want how the types are declared, not how they are implemented (we will be calling the C++ API, so we don’t care about the implementation).

    让我们谈谈C ++。 C ++代码库由源(或实现)或头(或声明)文件组成。 精心设计的代码库将尝试尽可能地将其声明与实现分离,这对我们而言具有隐式好处-分离越多,绑定生成过程就越容易。 为什么? 因为我们只想声明类型,而不是如何实现(我们将调用C ++ API,所以我们不在乎实现)。

    Which files to parse? In an ideal case, parsing the header files and generating bindings for them should do the trick. This is not always possible, because definition and implementation cannot be strictly separated, and certain features like templates are instantiated in source files, so we need to get the type references from those files and thus, parse the source files too.

    要解析哪些文件? 在理想情况下,解析头文件并为它们生成绑定应该可以解决问题。 这并非总是可能的,因为定义和实现不能严格分开,并且某些功能(例如模板)已在源文件中实例化,因此我们需要从这些文件中获取类型引用,从而也解析源文件。

    So, how to extract this information? This can be done in 2 ways:

    那么,如何提取这些信息? 这可以通过两种方式完成:

    [1.1] Manually create a representation.

    [1.1]手动创建表示。

    Just Kidding. Don’t do that.

    开玩笑。 不要那样做

    [1.1] String matching

    [1.1]字符串 匹配

    We can perform regex like operations on the code, to match a well-defined pattern based on the language’s keywords.

    我们可以对代码执行类似正则表达式的操作,以根据语言的关键字匹配定义良好的模式。

    Consider a simple example: we have a line in our code starting with #. What is it used for? It may be a comment. Or, it may be used for including a header file. Or, maybe defining a macro. It can also be a preprocessor directive. How do we decide?

    考虑一个简单的示例:代码中以#开头的一行。 它是干什么用的? 这可能是一条评论。 或者,它可以用于包括头文件。 或者,也许定义一个宏。 它也可以是预处理器指令。 我们如何决定?

    We can look at what follows the #.If the #is followed by include, we know it is an inclusion directive, and the text following is the header file name. Subsequently, the file name can appear within angular braces #include <filename>or quotes #include "filename". If it is followed by define, it is a macro definition. ifdef , ifndef , else , endif etc. indicates preprocessor directives. After handling all these special cases, we can pass the last remaining case as a comment line.

    我们可以看一下#.后面的内容#. 如果#后面跟随include ,则我们知道它是一个包含指令,其后的文本是头文件名。 随后,文件名可以出现在尖括号#include <filename>或引号#include "filename" 。 如果在其后跟随define ,则它是一个宏定义。 ifdef , ifndef , else , endif等表示预处理程序指令。 处理完所有这些特殊情况后,我们可以将最后剩下的情况作为注释行传递。

    As you can observe, it can get complicated pretty quick. This was just for the case of the# symbol.

    如您所见,它很快就会变得复杂。 这只是针对#符号的情况。

    There are countless conditions to be taken care of, so designing a stringent system based on string matching is extremely difficult. The technique must also ensure that all files have standard formatting while also introducing bounds on how new code should be written. Some projects use this approach, but it can be quite tricky to handle.

    要处理的条件无数,因此基于字符串匹配设计严格的系统非常困难。 该技术还必须确保所有文件都具有标准格式,同时还要引入有关如何编写新代码的界限。 一些项目使用这种方法,但是处理起来可能非常棘手。

    Have a look at this repo for some reference to this method.

    请查看此仓库以获取对该方法的一些参考。

    [1.2] Using a parser

    [1.2]使用解析器

    The Clang project needs no introduction. It provides a language front-end and tooling infrastructure for languages in the C language family. Apart from the compiler that we are well aware of, clang-tools are a suite of excellent tools such as clang-format, clang-check and clang-tidy, built on top of the LibTooling interface. Clang provides interfaces such as LibClang, LibTooling and Clang Plugins, to develop standalone clang-tools of our own, using static analysis. See some cool examples by Peter Goldsborough.

    Clang项目无需介绍。 它为C语言家族的语言提供了语言前端和工具基础结构。 除了我们熟知的编译器之外,clang-tools是一套出色的工具,例如clang-format,clang-check和clang-tidy,它们建立在LibTooling接口的顶部。 锵提供了接口,如LibClang , LibTooling和锵插件 ,开发我们自己的独立铛工具,利用静态分析。 参见 Peter Goldsborough的一些出色示例 。

    I chose LibClang, which provides a stable high-level C interface to clang. Using LibClang we can extract the “information” from the Abstract Syntax Tree (AST) generated during our C++ file’s static analysis. The AST contains information about the file’s symbols and their relationships with each other in the form of a tree, with the root being the Translation Unit of the file. LibClang also offers python bindings, which I used in this project.

    我选择了LibClang ,它为clang提供了稳定的高级C接口。 使用LibClang,我们可以从C ++文件静态分析期间生成的抽象语法树 (AST)中提取“信息”。 AST以树的形式包含有关文件符号及其相互关系的信息,其根是文件的翻译单元。 LibClang还提供了我在该项目中使用的python绑定 。

    Also, solutions such as cppheaderparser, GCC_XML and CastXML also serve a similar purpose.

    同样,诸如cppheaderparser , GCC_XML和CastXML之类的解决方案也可以达到类似的目的。

    [2]定义规则 ([2] Defining rules)

    Like the previous step, we have 2 options: either define the rules manually or use an existing library to do our c++ <-> pythonmapping for us. Multiple approaches can be followed for both the options, I’ll not be discussing that in-depth.

    像上一步一样,我们有2个选项:手动定义规则或使用现有库为我们做c++ <-> python映射。 两种方法都可以采用多种方法,我将不进行深入讨论。

    For manually defining the rules, our best bet is using Python’s foreign function interface (ffi) ctypes. ctypes are extremely powerful in the way they provide C compatible data types and enable calling functions in DLLs or shared libraries. Although some big projects follow this approach, why reinvent the cycle?

    为了手动定义规则,我们最好的选择是使用Python的外部函数接口(ffi) ctypes 。 ctypes提供C兼容数据类型并启用DLL或共享库中的调用函数的方式非常强大。 尽管一些大型项目采用这种方法,但为什么要重塑周期?

    There exists excellent libraries specifically for this purpose, pybind11, PyBindGen, SWIG, Boost.Python to name a few, each with its own pros and cons. I decided to go with pybind11 as it was a perfect fit for our application. It is an extremely lightweight (~4k lines) header-only library that exposes C++ types in Python and vice versa, mainly to create Python bindings of existing C++ code.

    有专门用于此目的的优秀库, pybind11 , PyBindGen , SWIG , Boost.Python 仅举几例,每个都有各自的优缺点。 我决定选择pybind11,因为它非常适合我们的应用程序。 这是一个非常轻量级的(约4k行)仅标头的库,它公开了Python中的C ++类型,反之亦然,主要用于创建现有C ++代码的Python绑定。

    [3]生成胶水代码 ([3] Generate glue code)

    The steps involve may vary depending on the steps in the earlier part of this pipeline, but in essence, it is just combining the two, take the parsed information and generate glue code for it. In my case, we take the parsed information, generate pybind11 code for it, compile it into a python module, and voila, we get our python bindings. But of course, it is easier said than done 😆

    涉及的步骤可能会有所不同,具体取决于该管道前面部分中的步骤,但是从本质上讲,它只是将两者结合在一起,获取已解析的信息并为其生成粘合代码。 就我而言,我们获取解析的信息,为其生成pybind11代码,将其编译为python模块,瞧,我们得到了python绑定。 但是,当然,说起来容易做起来难

    We need to handle the symbols in the AST according to how we want to expose our codebase in the wrapping language. It also depends on what kind of library (or manual method) we are using for the glue code (step 2: pybind11 in my case), how we represent our parsed data (step 1: AST -> JSONin my case), the native type conversions (ex, std::vector <-> list) if any, offered by the library, and so much more!

    我们需要根据要在包装语言中公开代码库的方式来处理AST中的符号。 它还取决于我们用于粘合代码的库(或手动方法)类型(在本例中为步骤2:pybind11),如何表示已解析的数据(在本例中为步骤1:AST- AST -> JSON ),库提供的本机类型转换(例如, std::vector <-> list ),还有更多其他功能!

    While manually binding a c++ file via pybind11 is fairly simple, thanks to the latter’s simple API, it gets tricky when we want to automate this process.

    尽管通过pybind11手动绑定c ++文件非常简单,但由于后者的简单API,当我们要使该过程自动化时,它变得棘手。

    自动化吗? 为什么? (Automate? Why?)

    Automation is incredible, and the only question stopping from automating processes is the question, “Is this an overkill?”. Moreover, an automated pipeline is more manageable than its manual counterpart. However, in our case, it was more of a necessity.

    自动化是不可思议的,并且从自动化过程中停止的唯一问题是“这是过大的杀伤力吗?”这个问题。 此外,自动管道比手动管道更易于管理。 但是,就我们而言,这绝对是必要的。

    To give you a reference, a .hppfile defining point types was ~2400 lines, and its AST’s generated JSON was ~44K lines! The final pybind11 bound file was ~850 lines.

    为了给您提供参考,定义点类型的.hpp文件为.hpp行,而其AST生成的JSON为〜44K行! 最终的pybind11绑定文件约为850行。

    That’s just one file. PCL’s codebase is huge, with hundreds of files containing hundreds (or thousands) of lines of code. Manually binding files is borderline impossible. Not to mention cumbersome and perhaps very boring. The project needs to be automated, right from parsing till compilation, else it just wouldn’t be possible. Hence, the python scripts for parsing C++ files and generating pybind11 code for them, the compilation database and the CMake-setuptools combo for compilation, all were designed to accommodate this need.

    那只是一个文件。 PCL的代码库很大,有数百个文件,包含数百(或数千)行代码。 手动绑定文件是不可能的。 更不用说麻烦,也许很无聊。 从解析到编译,项目都需要自动化,否则就不可能了。 因此,用于解析C ++文件并为其生成pybind11代码的python脚本,编译数据库以及用于编译的CMake-setuptools组合,都是为了满足这一需求而设计的。

    This was just an introduction. I encountered many intricacies during this project, which were a source of both pain and pleasure — maybe some other time.

    这只是一个介绍。 在这个项目期间,我遇到了许多错综复杂的事情,这些错综复杂的事情既是痛苦又是快乐的源泉,也许是其他时候。

    Thanks for reading!

    谢谢阅读!

    翻译自: https://medium.com/@iamdm99/google-summer-of-code-2020-with-point-cloud-library-f5a74d4e61c0

    google代码

    Processed: 0.010, SQL: 8