过滤器拦截器的理解

科技2023-11-23 112

过滤器拦截器的理解

Based on sources from across the internet, the total world population that speaks English is somewhere between 1 in 6 and 1 in 7. Despite this underwhelming minority of the world’s population speaking English, a vast majority of natural language datasets for understanding and generation like the Stanford Question Answering dataset (SQuAD) and GLUE datasets as well as the large scale pretrained models like BERT, RoBERTa and ALBERT that have revolutionized the NLP world are solely based on the English language.

根据互联网上的消息来源，说英语的全球总人数介于六分之一到七分之一之间。尽管在讲英语的世界人口中，这部分人占绝对优势，但绝大多数自然语言数据集都可以使人们理解和产生英语。斯坦福问答数据库(SQuAD)和GLUE数据集以及对NLP世界产生革命性影响的大规模预训练模型(例如BERT，RoBERTa和ALBERT)仅基于英语。

However, there has been a recent focus on other languages with the creation of multi-lingual large scale pretrained models like XLM and XLM-RoBERTa and the introduction of complex multi-lingual tasks like question answering, document classification, information retrieval and more. Mostly recently, XTREME and XGLUE are 2 collections of multi-lingual datasets that require models to be good at several tasks to perform well on their scoreboards.

但是，最近，人们开始关注其他语言，包括创建多语言的大规模预训练模型(例如XLM和XLM-RoBERTa)以及引入复杂的多语言任务(例如问题解答，文档分类，信息检索等)。最近，XTREME和XGLUE是多语言数据集的2个集合，它们要求模型要善于完成多项任务才能在计分板上表现良好。

Now let’s dive a little deeper into how these multi-lingual models are created. There’s 2 main schools of thought to approaching this:

现在让我们更深入地研究如何创建这些多语言模型。有两种主要的方法可以解决这个问题：

Learning an embedding that is common across all the languages in the world (or whatever’s available to train with at least). This can be done by either feeding a huge amount of text in a multitude of languages to a large language model like XLM or XLM-RoBERTa to learn an implicit embedding (a word has a different embedding based on the context) or by just using a simple neural network to learn an explicit embedding like Word2Vec (a word always maps to the same embedding)

学习世界上所有语言都通用的嵌入(或至少可以进行任何培训的语言)。这可以通过以下方式完成：将大量语言的大量文本输入XLM或XLM-RoBERTa等大型语言模型以学习隐式嵌入(单词根据上下文具有不同的嵌入)，或者仅使用简单的神经网络来学习像Word2Vec这样的显式嵌入(一个词总是映射到相同的嵌入)

Translating the data either at train (Translate-train) or test (Translate-test) time. Translate-train is when the training data in English is translated to a foreign language and the translated text is added to the training dataset. Translate-test is when the foreign language data is converted to English at test time and the model does its prediction on this English data. The prediction can be translated back to the foreign language for tasks like question answering or span selection or is just a class that doesn’t require translation.

在训练(Translate-train)或测试(Translate-test)时转换数据。翻译训练是指将英语的训练数据翻译成外语并将翻译后的文本添加到训练数据集中的情况。翻译测试是指在测试时将外语数据转换为英语并且模型对该英语数据进行预测的情况。可以将预测结果翻译回外语，以完成问题解答或跨度选择等任务，或者只是不需要翻译的课程。

Even though the dataset in the translate-train method has been augmented by the translated English sentences, the input data is still only one language. Nowhere in the translate-train architecture do multiple languages ever interact. Thus, translate-train is more of a data augmentation method than one that promotes an understanding of multiple languages.

即使翻译训练方法中的数据集已通过翻译的英语句子进行了扩充，但输入数据仍然只是一种语言。在翻译培训体系结构中，没有任何语言可以进行多种语言的交互。因此，翻译训练比促进理解多种语言的方法更像是一种数据增强方法。

The translate-train architecture. The streams on the left and the right go through the same set of transformers independently 转换火车体系结构。左右流分别经过同一组变压器

Inspired by the translate-train method but with a desire to make the model understand the relationships between 2 languages, the researchers at Microsoft Dynamics 365 AI Research propose a new way to train a multi-lingual model called FILTER. The input to FILTER is the same as any translate-train model: the sentence or paragraph in English (or any other source language) (E) along with its corresponding translated version in the target foreign language (F) you want to train.

受翻译训练方法的启发，但又希望使模型能够理解两种语言之间的关系，Microsoft Dynamics 365 AI Research的研究人员提出了一种新的方法来训练称为FILTER的多语言模型。 FILTER的输入与任何翻译训练模型相同：您希望训练的英语(或任何其他源语言) (E)的句子或段落，以及其对应的目标外语(F)的翻译版本。

FILTER is also a 3 stage architecture like translate-train. But that’s where the similarities stop. Where translate-train uses either (E) and (F) as input, FILTER uses both (E) and (F). These sentences both go through 2 copies (one for each language) of a “local” transformer, which has m layers, to learn unique embeddings for each of the languages.

FILTER也是一个3阶段架构，例如平移式火车。但这就是相似之处停止的地方。如果平移火车使用(E)和(F)作为输入，则FILTER使用(E)和(F) 。这些句子都经过2层(本地语言)转换器的副本(每种语言一种)，该转换器具有m层，以学习每种语言的唯一嵌入。

The output from both these “local” transformers is then fed into a cross-lingual “fusion” transformer with k layers. Here FILTER tries to glean information and learn relationships across (E) and (F).

然后将这两个“本地”变压器的输出馈入跨语言“融合”变压器有k层。在这里，FILTER尝试收集信息并学习(E)和(F)之间的关系。

Finally, there are 2 “domain” transformers (which are once again copies of each other) that have 24-k-m layers and are both task and language specific. The label provided to each “domain” transformer is the corresponding language’s label.

最后，有2个“域”转换器(又是彼此的副本)，它们具有24- k - m层，并且都是任务和语言特定的。提供给每个“域”转换器的标签是相应语言的标签。

The numbers m and k are hyper parameters that can be tuned based on the task as well.

数字m和k是可以根据任务调整的超参数。

The FILTER architecture. The 2 “local” transformers (m layers) and the 2 “domain” transformers (24-k-m layers) share parameters between them i.e. they are copies of one another. FILTER架构。 2个“本地”变压器(m层)和2个“域”变压器(24 km层)在它们之间共享参数，即它们是彼此的副本。

For simple tasks like classification, the labels remain the same in both languages and it’s easy to train the final language specific layers. But what about for tasks like question answering, entity recognition or part of speech tagging? The labels may not be directly applicable to the target translated text because of the way different languages structure their sentences. How do you train the target language part of the model?

对于诸如分类之类的简单任务，标签在两种语言中均保持相同，并且很容易训练最终语言特定的层。但是，对于诸如回答问题，实体识别或语音标记的任务呢？由于不同语言构造其句子的方式，标签可能无法直接应用于目标翻译文本。您如何训练模型的目标语言部分？

FILTER has a solution for this too using knowledge distillation. First train a teacher model using FILTER with just the source language (generally English) label. Once you have this teacher model, train a student model with the target language labels being the output from the task specific target transformer of the teacher model. This way, the student learns all the hidden knowledge that the teacher has as well.

FILTER也使用知识蒸馏来解决此问题。首先使用仅带有源语言(通常为英语)标签的FILTER训练教师模型。一旦有了该教师模型，就可以训练一个学生模型，并使用目标语言标签作为教师模型中特定于任务的目标转换器的输出。这样，学生就可以学习老师也拥有的所有隐藏知识。

Since the final transformer is task specific, FILTER can be easily applied to a variety of tasks by just changing the final transformer. To prove the generalizability of FILTER, the researchers applied it to the multi-task datasets of XTREME and XGLUE. FILTER was able to perform well on a multitude of tasks like multi-language question answering, sentence retrieval, sentence-pair classification, named entity recognition and more to achieve the #1 position on both these complex leaderboards!

由于最终转换器是特定于任务的，因此只需更改最终转换器即可将FILTER轻松应用于各种任务。为了证明FILTER的可推广性，研究人员将其应用于XTREME和XGLUE的多任务数据集。 FILTER在多种任务上都表现出色，例如多语言问答，句子检索，句子对分类，命名实体识别等等，在这两个复杂的排行榜上都排名第一！

Here’s a link to the paper if you want to know more about the FILTER model and click here to see more of our publications and other work.

如果您想了解有关FILTER模型的更多信息，请点击此处链接到本文，并单击此处查看我们的更多出版物和其他作品。

Hu, J.; Ruder, S.; Siddhant, A.; Neubig, G.; Firat, O.; and Johnson, M. 2020. Xtreme: A massively multilingual multi-task benchmark for evaluating cross-lingual generalization. In International Conference on Machine Learning.

胡建鲁德，S。 Siddhant，A。诺伊比比(G. 菲拉特(Firat) 和Johnson， M.2020。Xtreme：用于评估跨语言泛化的大规模多语言多任务基准。在国际机器学习大会上。

Liang Y, Duan N, Gong Y, et al. XGLUE: A new benchmark dataset for cross-lingual pre-training, understanding and generation[J]. arXiv preprint arXiv:2004.01401, 2020.

Liang Y，Duan N，Gong Y等。 XGLUE：用于跨语言预训练，理解和生成的新基准数据集[J] 。 arXiv预印本arXiv：2004.01401，2020。

Yuwei Fang, Shuohang Wang, Zhe Gan, Siqi Sun, Jingjing Liu. 2020. FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding. arXiv preprint arXiv:2009.05166

方宇伟，王硕行，甘哲，孙思奇，刘晶晶。 2020年。FILTER：一种增强的融合方法，用于跨语言理解。 arXiv预印本arXiv：2009.05166

Translation tool to convert text from English to a foreign language and vice versa

翻译工具，可将文本从英语转换为外语，反之亦然

翻译自: https://medium.com/swlh/filter-understand-foreign-languages-better-4bfa6d12377f

过滤器拦截器的理解

相关资源：四史答题软件安装包exe

Processed: 0.009, SQL: 8