机器学习 识别图片人物动作
This article was originally written February 28, 2017.
本文最初写于2017年2月28日。
Let’s say you want to teach a computer to read handwritten digits. You might give it a bunch of rules to tell it what to do. For example, an oval is most likely a 0. Another approach you might try is “machine learning.” Give a computer a bunch of examples of each digit to study so that it can learn its own rules. This latter method has worked surprisingly well. In fact, most banks use this technology to allow ATMs or mobile phones read the amount on a check without the need for human interaction.
假设您想教一台计算机来读取手写数字。 您可能会给它一些规则来告诉它该怎么做。 例如,椭圆形很可能是0。您可以尝试的另一种方法是“机器学习”。 给计算机提供一堆有关每个数字的示例以供学习,以便它可以学习自己的规则。 后一种方法效果很好。 实际上,大多数银行都使用这种技术,使ATM或移动电话无需人工干预即可读取支票上的金额。
One limitation with current machine learning techniques, however, is that they require a lot of examples. For example, if you want to teach a computer to recognize cats, you need to first give the computer many pictures of cats so it can learn what cats looks like. These examples might not always exist or might be very expensive to obtain. What if you could teach a computer to learn a new concept, such as a “cat,” from just one or two examples? This is exactly what three researchers, Lake, Salakhutdinov, and Tenebaum, from MIT did.
但是,当前机器学习技术的局限性在于它们需要大量示例。 例如,如果要教计算机识别猫,则需要首先为计算机提供许多猫的图片,以便它可以了解猫的外观。 这些示例可能并不总是存在,或者获取起来可能非常昂贵。 如果您可以仅通过一个或两个示例来教计算机学习诸如“猫”之类的新概念,该怎么办? 这正是麻省理工学院的三个研究人员Lake,Salakhutdinov和Tenebaum所做的。
The researchers specifically focused on character recognition. They asked: can we teach a computer to recognize new characters after just seeing one example? The end result was an algorithm that was just as good as humans at learning what new characters look like. Specifically, the algorithm performed just as well as humans in character recognition and generation tasks.
研究人员专门研究了字符识别。 他们问:仅看一个例子,我们可以教一台计算机识别新字符吗? 最终结果是一种算法,与人类在学习新字符的外观方面一样出色。 具体而言,该算法在字符识别和生成任务方面的表现与人类一样好。
How well can you recognize a character that you’ve never seen before? To evaluate their algorithm’s performance in this test, the researchers compared their algorithm’s performance to that of humans. They first gathered a group of handwritten characters from various alphabets. The researchers then gave each participant an example of a character they had never seen before and asked the participant to find the character in a set of 20 new characters from the same alphabet. They asked their algorithm to do the same. Surprisingly, the algorithm (3.3% error rate) performed just as well as the people (avg. 4.5% error rate)!
您如何认识以前从未见过的角色? 为了评估该算法在该测试中的性能,研究人员将其算法性能与人类的性能进行了比较。 他们首先收集了来自各种字母的一组手写字符。 然后,研究人员为每个参与者提供了一个他们从未见过的字符的示例,并要求参与者从同一字母的20个新字符集中找到该字符。 他们要求他们的算法做同样的事情。 令人惊讶的是,算法(错误率3.3%)的表现与人员(平均错误率4.5%)一样好!
Can you find the character that matches the one in the red box? 您可以在红色框中找到与该字符匹配的字符吗?Can you generate examples of how other people would write a character? Using the same group of handwritten characters, they gave each participant an example of a character they had never seen before, and then asked the participant to create a new example of that character. They asked their algorithm to do the same thing. To test how well the algorithm did, they showed a group of computer-generated characters and a group of human-written characters to a judge to see if the judge could differentiate between the two. The judges could only identify the computer-generated characters 52% of the time, not doing much better than random chance (50%).
您能否生成其他人如何写角色的示例? 他们使用同一组手写字符,为每个参与者提供了一个他们从未见过的字符的示例,然后要求参与者创建该字符的新示例。 他们要求他们的算法做同样的事情。 为了测试该算法的效果,他们向法官展示了一组计算机生成的字符和一组人工手写的字符,以查看法官是否可以区分两者。 裁判只能在52%的时间内识别出计算机生成的角色,没有比随机几率(50%)更好。
It can be hard to find which examples were written by a machine. In this example, grid 1 on the left and grid 2 on the right were generated by machines. 很难找到由机器编写的示例。 在此示例中,左侧的网格1和右侧的网格2是由机器生成的。Given how well the algorithm does with just one example, the natural question that arises is, how did they do it?
考虑到该算法仅用一个示例的性能如何,出现的自然问题是,它们是如何做到的?
The core intuition behind the algorithm is realizing that a character can be seen as a series of strokes put together. The researchers taught the algorithm how to decompose an image of a character into a sequence of strokes that may have been used to write the character. The algorithm could then use this stroke-based representation as a base from which to generate new examples (e.g. Taking into account other ways a stroke might be written) or see which characters could be mapped to the same stroke-pattern.
该算法背后的核心直觉是认识到一个字符可以看作一系列笔画组合在一起。 研究人员教导了该算法如何将角色的图像分解为可能用于书写角色的一系列笔画。 然后,算法可以使用基于笔画的表示作为基础,从中生成新示例(例如,考虑到笔画的其他编写方式)或查看可以将哪些字符映射到相同的笔画样式。
To teach the computer how to map from character to strokes, the researchers used a method called Bayesian program learning. They broke up the task of going from character to stroke into parts and modeled each part as a probability distribution (how likely is it that there are three strokes given that the character looks like this… Etc.). Before running the algorithm, they gave the computer characters from 30 alphabets to teach the computer what the probability distributions should look like. While it still needed some data to learn the initial probabilities, now, instead of needing a thousand examples of a new character, now it only needs one!
为了教计算机如何从字符到笔划进行映射,研究人员使用了一种称为贝叶斯程序学习的方法。 他们分解了从角色到笔画的各个部分的工作,并将每个部分建模为概率分布(假设角色看起来像这样,那么有三笔画的可能性是……)。 在运行算法之前,他们给了30个字母的计算机字符,以告诉计算机概率分布应该是什么样。 虽然它仍然需要一些数据来学习初始概率,但是现在,不需要一千个新角色的示例,现在只需要一个!
Despite the impressive advances, there is still much work to be done. People see more than just strokes when they look at a character; they may also notice features such as parallel lines or symmetry. Furthermore, optional features can cause a lot of difficulty. Consider the character “7”. An algorithm might model it as a one-stroke character the first time it sees it. However, once it sees a “7” with a dash in it, it may consider it to be a different character because that requires two strokes, and it’s never seen a “7” with a dash in it. A human, however, might be able to infer that a “7” with a dash is the same as a “7” without a dash, whether through the context or other factors.
尽管取得了令人瞩目的进步,但仍有许多工作要做。 人们在看角色时看到的不仅是笔画。 他们可能还会注意到平行线或对称等特征。 此外,可选功能可能会导致很多困难。 考虑字符“ 7”。 一种算法可能会在第一次看到它时将其建模为单笔画字符。 但是,一旦看到带有破折号的“ 7”,它可能会认为它是一个不同的字符,因为这需要两次击键,而且从未看到带有破折号的“ 7”。 然而,无论是通过上下文还是其他因素,人类都可以推断带破折号的“ 7”与不带破折号的“ 7”相同。
Is it a 7? 是7吗?This algorithm is also very specific toward recognizing characters. It would be interesting to see if we could develop similar “one-shot learning” algorithms in other areas. For example, what if a self-driving car could learn to recognize and obey a new sign after watching another car react to it once?
该算法在识别字符方面也非常具体。 有趣的是,我们是否可以在其他领域开发类似的“一次性学习”算法。 例如,如果无人驾驶汽车在观看另一辆汽车对此作出React后能够学会识别并遵守新的标志,该怎么办?
One key insight from this paper makes me think that this indeed can be possible. The researchers intentionally told the algorithm to think of characters as a series of strokes being put together rather than a grid of 0s and 1s. This representation is closer to how humans think about characters and using this human-based representation greatly increased how quickly the computer learned. A lot of artificial intelligence techniques have been based on how humans make decisions, but it may prove useful to study more of how humans learn and represent information as well.
本文的一个主要见解使我认为这确实是可能的。 研究人员有意告诉该算法将字符视为一系列笔画,而不是由0和1组成的网格。 这种表示方式更接近于人类如何思考字符,并且使用这种基于人类的表示方式大大提高了计算机的学习速度。 许多人工智能技术都是基于人类如何做出决策的,但是研究更多关于人类如何学习和表示信息的方法可能被证明是有用的。
While there is still a lot of work to be done, this paper represents a significant step forward in machine learning world.
尽管仍有许多工作要做,但本文代表了机器学习领域的重要一步。
Sources
资料来源
Original paper: http://web.mit.edu/cocosci/papers/science-2015-lake-1332-8.pdf.
原始论文: http : //web.mit.edu/cocosci/papers/science-2015-lake-1332-8.pdf。
For more information on automatic reading of handwritten digits, see http://yann.lecun.com/exdb/mnist/.
有关自动读取手写数字的更多信息,请参见http://yann.lecun.com/exdb/mnist/。
The middle three images are from Lake, Salakhutdinov, and Tenebaum’s paper (cited above). 中间的三个图像来自Lake,Salakhutdinov和Tenebaum的论文(上文引用)。翻译自: https://medium.com/analytics-vidhya/one-shot-learning-character-recognition-explained-54186327622d
机器学习 识别图片人物动作
相关资源:基于多特征核学习的人类动作识别算法
