微软研究开放源代码的textworld通过玩文字游戏来训练强化学习

科技2022-08-02 136

I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:

我最近开始了一份有关AI教育的新时事通讯。 TheSequence是无BS(意味着没有炒作，没有新闻等)，它是专注于AI的新闻通讯，需要5分钟的阅读时间。目标是让您了解机器学习项目，研究论文和概念的最新动态。请通过以下订阅尝试一下：

Conversational interfaces and natural language processing(NLP) are, arguably, the most widely adopted segment of modern artificial intelligence(AI). Despite the continuous progress in NLP research, most conversational interfaces today still feel rather primitive compare to a human equivalent. Most popular conversational AI agents such as Alexa or the Google Assistant are solid on very short dialogs but lack cognitive aspects of human dialogs such as memory, planning and even common sense. How can we establish a repeatable and quantifiable mechanism for training AI agents in sophisticated conversational capabilities? A few months ago, researchers from the Microsoft Research Montreal Lab, released an open source project called TextWorld, which attempts to train reinforcement learning agents using Text-Based games. The ideas behind TextWorld were captured in a recent research paper published by Microsoft.

会话接口和自然语言处理(NLP)可以说是现代人工智能(AI)应用最广泛的部分。尽管NLP研究不断取得进展，但与人类同等语言相比，当今大多数对话界面仍然感觉很原始。大多数流行的对话式AI代理(例如Alexa或Google助手)在很短的对话中就很稳定，但缺乏人类对话的认知方面，例如记忆，计划甚至常识。我们如何建立一种可重复且可量化的机制来培训AI代理人先进的对话能力？几个月前，微软研究蒙特利尔实验室的研究人员发布了一个名为TextWorld的开源项目，该项目试图使用基于文本的游戏来训练强化学习代理。 Microsoft近期发表的一篇研究论文记录了TextWorld背后的想法。

文字游戏 (Text-Based Games)

It might seem unusual to talk about Text-Based games in a time in which AI agents are mastering complex multi-player games such as Dota2 or Quake III. Whereas multi-player graphic environments are great to train agents on spatial and time-based planning, Text-Based games can play a similar role developing advanced conversational skills such as affordance extraction, memory and planning, exploration and several others.

在AI代理掌握Dota2或Quake III等复杂的多人游戏时，谈论基于文本的游戏似乎很不寻常。尽管多玩家图形环境非常适合培训代理商进行基于空间和时间的规划，但是基于文本的游戏可以发挥类似的作用，发展高级对话技能，例如提价能力，记忆和规划，探索等。

Conceptually, Text-based games are complex, interactive simulations in which text describes the game state and players make progress by entering text commands. After each command, the game usually provides some feedback to inform players how that command altered the game environment. A typical text-based game poses a series of puzzles to solve, treasures to collect, and locations to reach. Take, for instance, the legendary Zork game which is shown in the figure below. The game uses natural language to describe the state of the world, to accept actions from the player, and to report subsequent changes in the environment.

从概念上讲，基于文本的游戏是复杂的交互式模拟，其中文本描述游戏状态，玩家通过输入文本命令来取得进步。在每个命令之后，游戏通常会提供一些反馈，以告知玩家该命令如何改变游戏环境。一个典型的基于文本的游戏会带来一系列难题，需要解决，需要收藏的珍宝以及可以到达的位置。以传奇的Zork游戏为例，如下图所示。游戏使用自然语言来描述世界状况，接受玩家的行为并报告环境的后续变化。

Source: https://www.microsoft.com/en-us/research/project/textworld/ 来源： https : //www.microsoft.com/zh-cn/research/project/textworld/

The richness and complexities of Text-Based games makes them an ideal environment to train reinforcement learning agents. If we extrapolate Text-Based games to the context of reinforcement learning agents, language plays both the role of the action and observation space. In both cases, the complexity of the space is combinatorial and compositional which creates many challenges for reinforcement learning agents.

基于文本的游戏的丰富性和复杂性使其成为培训强化学习代理的理想环境。如果我们将基于文本的游戏推断为强化学习主体的上下文，则语言既扮演行动角色，又充当观察空间。在这两种情况下，空间的复杂性都是组合和组成的，这给强化学习者带来了许多挑战。

文字游戏中的强化学习挑战 (Reinforcement Learning Challenges in Text-Based Games)

The only simple thing about Text-Based games is the input/output mechanism via a terminal. Other than that, the nature of Text-Based games poses numerous challenges for reinforcement learning agents:

关于基于文本的游戏，唯一简单的事情就是通过终端的输入/输出机制。除此之外，基于文本的游戏的性质对强化学习代理提出了许多挑战：

· Partial Observability: In the context of reinforcement learning, Text-Based games can be considered partially observable environments. At any given time, only a snapshot of the current game environment is presented to the players and even many of the local details might not be apparent in the observation.

· 部分可观察性：在强化学习的背景下，基于文本的游戏可被视为部分可观察的环境。在任何给定时间，只会向玩家呈现当前游戏环境的快照，甚至许多本地细节在观察中可能都不明显。

· Exploration vs. Exploitation: The infinite friction of reinforcement learning models is amplified in Text-Based games. Throughout the game, players need to balance the ability of exploring the environment further vs. capitalizing on immediate rewards.

· 探索与开发：强化学习模型的无限摩擦在基于文本的游戏中得到了放大。在整个游戏过程中，玩家需要平衡探索环境与利用即时奖励之间的平衡。

· Long-Term Credit Assignment: Sparse rewards are inherent to Text-Based games in which the agent must generate a sequence of actions before observing a change in the environment state or getting a reward signal. Knowing which actions produced a specific reward becomes incredibly challenging for reinforcement learning agents.

· 长期信用分配：稀疏奖励是基于文本的游戏所固有的，在这种游戏中，代理必须在观察环境状态变化或获得奖励信号之前生成一系列动作。对于强化学习者而言，知道哪些动作产生了特定的奖励变得异常困难。

进入TextWorld (Entering TextWorld)

The idea of TextWorld is not to directly create reinforcement learning agents that can beat a specific Text-Based games. That has been done before. TextWorld leverages Text-Based games differently by creating simplified representations of them that can be used to train reinforcement learning agents. Having simpler versions of Text-Based games improves the evaluation and interpretability of reinforcement learning algorithms in a highly controlled game space.

TextWorld的想法不是直接创建可以击败特定基于文本的游戏的强化学习代理。之前已经做过。 TextWorld通过创建基于文本的游戏的简化表示形式以不同的方式利用它们，以用于训练强化学习代理。具有更简单版本的基于文本的游戏可改善在高度受控的游戏空间中强化学习算法的评估和可解释性。

From the functional standpoint, TextWorld is a Python framework for creating Text-Based games environments that can be used to train reinforcement learning agents. The framework has two main components: a game generator and a game engine. The game generator converts high-level game specifications, such as number of rooms, number of objects, game length, and winning conditions, into an executable game source code in the Inform 7 language. The game engine is a simple inference machine that ensures that each step of the generated game is valid by using simple algorithms such as one-step forward and backward chaining.

从功能的角度来看，TextWorld是用于创建基于文本的游戏环境的Python框架，可用于训练强化学习代理。该框架具有两个主要组件：游戏生成器和游戏引擎。游戏生成器将高级游戏规范(例如房间数，对象数，游戏长度和获胜条件)转换为Inform 7语言的可执行游戏源代码。游戏引擎是一个简单的推理机，可通过使用简单的算法(例如单步向前和向后链接)来确保所生成游戏的每一步都是有效的。

Source: https://www.microsoft.com/en-us/research/project/textworld/ 来源： https : //www.microsoft.com/zh-cn/research/project/textworld/

Using TextWorld, it is possible to generate combinatorial sets of Text-Based games based on a specific set of parameters. TextWorld’s game generator takes as input a high-level specification of a game and outputs the corresponding executable game with specific parameters such as the number of rooms, the number of objects, the length of the quest, the winning conditions, and options for the text generation.

使用TextWorld，可以基于一组特定的参数来生成基于文本的游戏组合集。 TextWorld的游戏生成器将游戏的高级说明作为输入，并输出具有特定参数的相应可执行游戏，例如房间数，对象数，任务的长度，获胜条件以及文本选项代。

Reinforcement learning models can interact with TextWorld using a simple API that can be encapsulated in a few lines of Python code:

强化学习模型可以使用简单的API与TextWorld交互，该API可以封装在几行Python代码中：

import textworldenv = textworld.start("zork1.z5")game_state = env.reset() # Reset/initialize the game.reward, done = 0, Falsewhile not done:# Ask the agent for a command.command = agent.act(game_state, reward, done)# Send the command to the game and get the new state.game_state, reward, done = env.step(command)

Using that model, developers can create rich natural language training environments for reinforcement learning agents that will help with the development of skills such as memory, contextual analysis or long-term planning. TextWorld is available as an open source release in GitHub and is interoperable with many of the popular deep learning frameworks such as TensorFlow or PyTorch.

使用该模型，开发人员可以为强化学习代理创建丰富的自然语言培训环境，这将有助于开发技能，例如记忆，上下文分析或长期计划。 TextWorld可作为GitHub上的开源版本使用，并可与许多流行的深度学习框架(如TensorFlow或PyTorch)互操作。

翻译自: https://medium.com/dataseries/microsoft-research-open-sourced-textworld-to-train-reinforcement-learning-by-playing-text-games-5569f96f436f

相关资源：TextWorld：TextWorld是一个沙箱学习环境，用于在基于文本的游戏上训练和评估强化学习（RL）代理-源码

Processed: 0.017, SQL: 9