论文原文:PDF 论文年份:2019 论文被引:253(2020/10/05)
Anomaly detection is an important problem that has been well-studied within diverse research areas and application domains. The aim of this survey is two-fold, firstly we present a structured and comprehensive overview of research methods in deep learning-based anomaly detection. Furthermore, we review the adoption of these methods for anomaly across various application domains and assess their effectiveness. We have grouped state-of-the-art deep anomaly detection research techniques into different categories based on the underlying assumptions and approach adopted. Within each category, we outline the basic anomaly detection technique, along with its variants and present key assumptions, to differentiate between normal and anomalous behavior. Besides, for each category, we also present the advantages and limitations and discuss the computational complexity of the techniques in real application domains. Finally, we outline open issues in research and challenges faced while adopting deep anomaly detection techniques for real-world problems.
Keywords anomalies, outlier, novelty, deep learning
异常检测是一个重要的问题,已经在不同的研究领域和应用领域得到了很好的研究。本研究的目的有两个方面:首先,我们对基于深度学习的异常检测的研究方法进行了系统而全面的综述。此外,我们回顾了这些方法在不同应用领域的异常情况下的应用,并评估了它们的有效性。我们根据所采用的基本假设和方法,将最先进的深度异常检测研究技术分为不同的类别。在每个类别中,我们概述了基本的异常检测技术及其变体,并提出了关键假设,以区分正常和异常行为。此外,对于每一个类别,我们还提出了优势和局限性,并讨论了在实际应用领域的技术的计算复杂性。最后,我们概述了研究中的开放问题和在现实问题中采用深度异常检测技术时面临的挑战。
A common need when analyzing real-world data-sets is determining which instances stand out as being dissimilar to all others. Such instances are known as anomalies, and the goal of anomaly detection (also known as outlier detection) is to determine all such instances in a data-driven fashion (Chandola et al. [2007]). Anomalies can be caused by errors in the data but sometimes are indicative of a new, previously unknown, underlying process; Hawkins [1980] defines an outlier as an observation that deviates so significantly from other observations as to arouse suspicion that it was generated by a different mechanism. In the broader field of machine learning, the recent years have witnessed a proliferation of deep neural networks, with unprecedented results across various application domains. Deep learning is a subset of machine learning that achieves good performance and flexibility by learning to represent the data as a nested hierarchy of concepts within layers of the neural network. Deep learning outperforms the traditional machine learning as the scale of data increases as illustrated in Figure 1. In recent years, deep learning-based anomaly detection algorithms have become increasingly popular and have been applied for a diverse set of tasks as illustrated in Figure 2; studies have shown that deep learning completely surpasses traditional methods (Javaid et al. [2016], Peng and Marculescu [2015]). The aim of this survey is two-fold, firstly we present a structured and comprehensive review of research methods in deep anomaly detection (DAD). Furthermore, we also discuss the adoption of DAD methods across various application domains and assess their effectiveness.
在分析现实世界的数据集时,一个常见的需求是确定哪些实例与所有其他实例不同。这种实例被称为异常(anomalies),异常检测(anomaly detection)也称为离群值检测(outlier detection),其目标是以数据驱动的方式确定所有这种实例(Chandola等人[2007])。异常可能是由数据中的错误引起的,但有时是一个新的、以前未知的潜在过程的指示;Hawkins[1980] 将异常值定义为与其他观察值偏差如此之大,以至于让人怀疑它是由不同的机制产生的观察值。在更广泛的机器学习领域,近年来出现了深度神经网络,在各种应用领域取得了前所未有的成果。深度学习是机器学习的一个子集,它通过学习用神经网络的嵌入者来表示数据,从而获得良好的性能和灵活性。深度学习优于传统的机器学习,如图1所示,随着数据规模的增加。近年来,基于深度学习的异常检测算法变得越来越流行,并已应用于各种任务,如图2所示;研究表明深度学习完全超越了传统方法(Javaid et al. [2016], Peng and Marculescu [2015])。本文的研究目的有两个方面:首先,我们对深度异常检测的研究方法进行了系统而全面的综述。此外,我们还讨论了在各种应用领域中采用DAD方法,并评估了它们的有效性。
Anomalies are also referred to as abnormalities, deviants, or outliers in the data mining and statistics literature (Aggarwal [2013]). As illustrated in Figure 3, N1and N2 are regions consisting of a majority of observations and hence considered as normal data instance regions, whereas the region O3, and data points O1 and O2 are few data points which are located further away from the bulk of data points and hence are considered anomalies. arise due to several reasons, such as malicious actions, system failures, intentional fraud. These anomalies reveal exciting insights about the data and are often convey valuable information about data. Therefore, anomaly detection considered an essential step in various decision-making systems.
在DAD和统计文献中,异常值(anomalies)也被称为异常(abnormalities)、异常值(deviants)、离群值(Aggarwal [2013])。如图3所示, N 1 N_1 N1 和 N 2 N_2 N2 是由大部分观测值组成的区域,因此被视为正常数据实例区域,而区域 O 3 O_3 O3 以及数据点 O 1 O_1 O1 和 O 2 O_2 O2 是远离大部分数据点的少数数据点,因此被视为异常。出现的原因有多种,如恶意操作、系统故障、故意欺诈。这些异常揭示了关于数据的令人兴奋的见解,并且经常传达关于数据的有价值的信息。因此,异常检测被认为是各种决策系统中必不可少的一步。
Novelty detection is the identification of a novel (new) or unobserved patterns in the data (Miljkovi´ c [2010]). The novelties detected are not considered as anomalous data points; instead, they are been applied to the regular data model. A novelty score may be assigned for these previously unseen data points, using a decision threshold score (Pimentel et al. [2014]). The points which significantly deviate from this decision threshold may be considered as anomalies or outliers. For instance, in Figure 4 the images of (white tigers) among regular tigers may be considered as a novelty, while the image of (horse, panther , lion, and cheetah) are considered as anomalies. The techniques used for anomaly detection are often used for novelty detection and vice versa.
新颖性检测(Novelty detection)是对数据中新的或未观察到的模式的识别。检测到的新颖性数据不被认为是异常数据点;相反,它们被应用于常规数据模型。可以使用决策阈值分数为这些先前未见的数据点分配新颖性分数(Pimentel等人[2014])。明显偏离此决策阈值的点可被视为异常或异常值。例如,在图4中,普通老虎(白老虎)的图像可能被认为是新奇事物,而(马,豹,狮子和猎豹)的图像则被认为是异常。用于异常检测的技术通常用于新颖性检测,反之亦然。
Despite the substantial advances made by deep learning methods in many machine learning problems, there is a relative scarcity of deep learning approaches for anomaly detection. Adewumi and Akinyelu [2017] provide a comprehensive survey of deep learning-based methods for fraud detection. A broad review of deep anomaly detection (DAD) techniques for cyber-intrusion detection is presented by Kwon et al. [2017]. An extensive review of using DAD techniques in the medical domain is presented by Litjens et al. [2017]. An overview of DAD techniques for the Internet of Things (IoT) and big-data anomaly detection is introduced by Mohammadi et al. [2017]. Sensor networks anomaly detection has been reviewed by Ball et al. [2017]. The state-of-the-art deep learning based methods for video anomaly detection along with various categories have been presented in Kiran et al. [2018]. Although there are some reviews in applying DAD techniques, there is a shortage of comparative analysis of deep learning architecture adopted for outlier detection. For instance, a substantial amount of research on anomaly detection is conducted using deep autoencoders, but there is a lack of comprehensive survey of various deep architecture’s best suited for a given data-set and application domain. We hope that this survey bridges this gap and provides a comprehensive reference for researchers and engineers aspiring to leverage deep learning for anomaly detection. Table 1 shows the set of research methods and application domains covered by our survey.
尽管深度学习方法在许多机器学习问题上取得了实质性的进展,但用于异常检测的深度学习方法相对较少。Adewumi和Akinyelu[2017]对基于深度学习的欺诈检测方法进行了全面研究。Kwon等人[2017]对用于网络入侵检测的深度异常检测技术进行了广泛的综述。Litjens等人[2017]对在医学领域使用DAD技术进行了广泛的综述。Mohammadi等人[2017]介绍了物联网和大数据异常检测的分布式数据处理技术的概述。Ball等人[2017]已经回顾了传感器网络异常检测。最先进的基于深度学习的视频异常检测方法以及各种类别已在Kiran等人[2018]中介绍。尽管在DAD技术应用方面有一些综述,但是对于用于离群点检测的深度学习框架缺乏比较分析。例如,大量关于异常检测的研究是使用深度自动编码器进行的,但是缺乏对最适合给定数据集和应用领域的各种深度体系结构的全面研究。我们希望这项研究能够弥补这一差距,并为希望利用深度学习进行异常检测的研究人员和工程师提供全面的参考。表1显示了我们的研究涵盖的一套研究方法和应用领域。
We follow the survey approach of (Chandola et al. [2007]) for deep anomaly detection (DAD). Our survey presents a detailed and structured overview of research and applications of DAD techniques. We summarize our main contributions as follows:
我们遵循深度异常检测的研究方法。我们的研究提供了一个详细的和结构化的有关DAD技术的研究和应用的综述。我们的主要贡献如下:
Most of the existing surveys on DAD techniques either focus on a particular application domain or specific research area of interest (Kiran et al. [2018], Mohammadi et al. [2017], Litjens et al. [2017], Kwon et al. [2017], Adewumi and Akinyelu [2017], Ball et al. [2017]). This review aims to provide a comprehensive outline of state-of-the-art research in DAD techniques as well as several real-world applications these techniques is presented.现有的大多数关于DAD技术的研究要么侧重于特定的应用领域,要么侧重于感兴趣的特定研究领域(Kiran等人[2018年]、Mohammadi等人[2017年]、Litjens等人[2017年]、Kwon等人[2017年]、Adewumi和Akinyelu [2017年]、Ball等人[2017年])。这篇综述旨在提供DAD技术的最新研究的全面概述,以及介绍这些技术的几种实际应用。。In recent years several new deep learning based anomaly detection techniques with greatly reduced computational requirements have been developed. The purpose of this paper is to survey these techniques and classify them into an organized schema for better understanding. We introduce two more sub-categories Hybrid models (Erfani et al. [2016a])and one-class neural networks techniques (Chalapathy et al. [2018a]) as illustrated in Figure 5 based on the choice of training objective. For each category we discuss both the assumptions and techniques adopted for best performance. Furthermore, within each category, we also present the challenges, advantages, and disadvantages and provide an overview of the computational complexity of DAD methods.近年来,开发了几种新的基于深度学习的异常检测技术,大大降低了计算要求。本文的目的是研究这些技术,并将其分类到一个有组织的模式中,以便更好地理解。基于训练目标的选择,我们引入了如图5所示的两个子类混合模型(Erfani等人[2016a])和一类神经网络技术(Chalapathy等人[2018a])。对于每个类别,我们都讨论了为获得最佳性能而采用的假设和技术。此外,在每一个类别中,我们还提出了挑战,优势和劣势,并提供了DAD方法的计算复杂性的概述。This chapter is organized by following structure described in Figure 5. In Section 8, we identify the various aspects that determine the formulation of the problem and highlight the richness and complexity associated with anomaly detection. We introduce and definetwotypesofmodels: contextual and collective or group anomalies. InSection9, we briefly describe the different application domains to which deep learning-based anomaly detection has been applied. In subsequent sections, we provide a categorization of deep learning-based techniques based on the research area to which they belong. Based on training objectives employed and availability of labels deep learning-based anomaly detection techniques can be categorized into supervised (Section 10.1), unsupervised (Section 10.5), hybrid (Section 10.3), and one-class neural network (Section 10.4). For each category of techniques we also discuss their computational complexity for training and testing phases. In Section 8.4 we discuss the point, contextual, and collective (group) deep learning-based anomaly detection techniques. We present some discussion of the limitations and relative performance of various existing techniques in Section 12. Section 13 contains concluding remarks.
本章按照图5中描述的结构组织。在第8节中,我们确定了决定问题形成的各个方面,并强调了与异常检测相关的丰富性和复杂性。我们介绍并定义了两种类型的模型:上下文异常(contextual anomalies)和集合异常或组异常(collective or group anomalies)。在第9节中,我们简要描述了基于深度学习的异常检测的不同应用领域。在随后的章节中,我们根据深度学习技术所属的研究领域对其进行分类。基于所采用的训练目标和标签的可用性,基于深度学习的异常检测技术可以分为有监督的(第10.1节)、无监督的(第10.5节)、混合的(第10.3节)和单类神经网络(第10.4节)。对于每一类技术,我们还讨论了它们在训练和测试阶段的计算复杂性。在第8.4节中,我们讨论了基于点、上下文和集体(组)深度学习的异常检测技术。第12节中讨论了各种现有技术的局限性和相对性能。第13节得出了本文的结论。
This section identifies and discusses the different aspects of deep learning-based anomaly detection.
The choice of a deep neural network architecture in deep anomaly detection methods primarily depends on the nature of input data. Input data can be broadly classified into sequential (eg, voice, text, music, time series, protein sequences) or non-sequential data (eg, images, other data). Table 2 illustrates the nature of input data and deep model architectures used in anomaly detection. Additionally input data depending on the number of features (or attributes) can be further classified into either low or high-dimensional data. DAD techniques have been to learn complex hierarchical feature relations within high-dimensional raw input data (LeCun et al. [2015]). The number of layers used in DAD techniques is driven by input data dimension, deeper networks are shown to produce better performance on high dimensional data. Later on, in Section 10 various models considered for outlier detection are reviewed at depth.
深度异常检测方法中深度神经网络架构的选择主要取决于输入数据的性质。输入数据可大致分为序列数据(如语音、文本、音乐、时间序列、蛋白质序列)或非序列数据(如图像、其他数据)。表2说明了异常检测中使用的输入数据和深度模型架构的性质。另外,取决于特征(或属性)数量的输入数据可以进一步分类为低维或高维数据。DAD技术一直用于学习高维原始输入数据中复杂的层次特征关系(LeCun等人[2015])。在DAD技术中使用的层数是由输入数据维度驱动的,更深的网络显示出在高维数据上产生更好的性能。随后,在第10节中,我们将深入讨论各种用于异常值检测的模型。
Labels indicate whether a chosen data instance is normal or an outlier. Anomalies are rare entities hence it is challenging to obtain their labels. Furthermore, anomalous behavior may change over time, for instance, the nature of anomaly had changed so significantly and that it remained unnoticed at Maroochy water treatment plant, for a long time which resulted in leakage of 150 million liters of untreated sewerage to local waterways (Ramotsoela et al. [2018]). Deep anomaly detection (DAD) models can be broadly classified into three categories based on the extent of availability of labels. (1) Supervised deep anomaly detection. (2) Semi-supervised deep anomaly detection. (3) Unsupervised deep anomaly detection.
标签指示所选数据实例是正常的还是异常的。异常是罕见的实体,因此很难获得它们的标签。此外,异常可能会随时间的推移而发生变化,例如,异常的性质发生了如此显著的变化,以至于在很长一段时间内,Maroochy水处理厂都没有注意到这一点,导致1.5亿升未经处理的污水泄漏到当地的水道中(Ramotsoela等人[2018])。深度异常检测模型可以根据标签的可用性程度大致分为三类:(1)有监督的深度异常检测。(2)半监督深度异常检测。(3)无监督深度异常检测。
Supervised deep anomaly detection involves training a deep supervised binary or multi-class classifier, using labels of both normal and anomalous data instances. For instance supervised DAD models, formulated as multi-class classifier aids in detecting rare brands, prohibited drug name mention and fraudulent health-care transactions (Chalapathy et al. [2016a,b]). Despite the improved performance of supervised DAD methods, these methods are not as popular as semi-supervised or unsupervised methods, owing to the lack of availability of labeled training samples. Moreover, the performance of deep supervised classifier used an anomaly detector is sub-optimal due to class imbalance (the total number of positive class instances are far more than the total number of negative class of data). Therefore we do not consider the review of supervised DAD methods in this survey.
有监督的深度异常检测包括使用正常和异常数据实例的标签来训练深度有监督的二分类或多类分类器。例如,有监督的药品数据分析模型建模为多类分类器,有助于检测稀有品牌、违禁药品名称提及和欺诈性医疗保健交易(Chalapathy等人[2016a, b])。尽管有监督的DAD方法的性能有所提高,但由于缺乏标记训练样本,这些方法不如半监督或无监督方法受欢迎。此外,使用异常检测器的深度监督分类器由于类别不平衡(正类实例的总数远远多于负类数据的总数),导致其性能次优。因此,在本次研究中,我们不考虑对受监督的DAD方法的研究。
The labels of normal instances are far more easy to obtain than anomalies, as a result, semi-supervised DAD techniques are more widely adopted, these techniques leverage existing labels of single (normally positive class) to separate outliers. One common way of using deep autoencoders in anomaly detection is to train them in a semi-supervised way on data samples with no anomalies. With sufficient training samples, of normal class autoencoders would produce low reconstruction errors for normal instances, over unusual events (Wulsin et al. [2010], Nadeem et al. [2016], Song et al. [2017]). We consider a detailed review of these methods in Section 10.2.
正常实例的标签远比异常更容易获得,因此,半监督DAD技术被更广泛地采用,这些技术利用单个类(通常为正类)的现有标签来分离异常值。在异常检测中,常用深度自动编码器(deep autoencoders)在没有异常的数据样本上以半监督方式训练它们。在训练样本充足的情况下,对于正常情况,正常类的自动编码器对异常的重建误差(reconstruction error)较低(Wulsin等人[2010],Nadeem等人[2016],宋等人[2017])。我们将在第10.2节中详细介绍这些方法。
Unsupervised deep anomaly detection techniques detect outliers solely based on intrinsic properties of the data instances. Unsupervised DAD techniques are used in automatic labeling of unlabelled data samples since labeled data is very hard to obtain (Patterson and Gibson [2017]). V ariants of Unsupervised DAD models (Tuor et al. [2017]) are shown to outperform traditional methods such as principal component analysis (PCA) (Wold et al. [1987]), support vector machine (SVM) Cortes and V apnik [1995] and Isolation Forest (Liu et al. [2008]) techniques in applications domains such as health and cyber-security. Autoencoders are the core of all Unsupervised DAD models. These models assume a high prevalence of normal instances than abnormal data instances failing which would result in high false positive rate. Additionally unsupervised learning algorithms such as restricted Boltzmann machine (RBM) (Sutskever et al. [2009]), deep Boltzmann machine (DBM), deep belief network (DBN) (Salakhutdinov and Larochelle [2010]), generalized denoising autoencoders (Vincent et al. [2008]) , recurrent neural network (RNN) (Rodriguez et al. [1999]) Long short term memory networks (Lample et al. [2016]) which are used to detect outliers are discussed in detail in Section 11.7.
无监督深度异常检测技术仅基于数据实例的内在属性来检测异常值。无监督DAD技术用于未标记数据样本的自动标记,因为标记数据很难获得(Patterson和Gibson [2017])。在健康和网络安全等应用领域,无监督DAD模型(Tuor等人[2017])的表现优于传统方法,例如主成分分析(PCA)(Wold等人[1987]),支持向量机(Cortes和Vapnik [1995])隔离林(Liu等人[2008])。自动编码器是所有无监督DAD模型的核心。这些模型假设正常实例比异常数据实例更普遍,异常数据识别错误会导致高假阳性率(false positive rate)。此外,第11.7节详细讨论了用于检测异常值的无监督学习算法,例如受限玻尔兹曼机(RBM)(Sutskever等[2009]),深度玻尔兹曼机(DBM),深度置信网络(DBN)(Salakhutdinov和Larochelle [2010]),广义降噪自动编码器(Vincent等)等人(2008)),循环神经网络(RNN)(Rodriguez等人[1999])用于检测异常值的长期短期记忆网络(Lample等人[2016])。
In this survey we introduce two new categories of deep anomaly detection (DAD) techniques based on training objectives employed 1) Deep hybrid models (DHM). 2) One class neural networks (OC-NN).
在本综述中,我们基于所采用的训练目标介绍了两种新类别的深度异常检测(DAD)技术:1)深度混合模型(DHM)。 2)一类神经网络(OC-NN)。
Deep hybrid models for anomaly detection use deep neural networks mainly autoencoders as feature extractors, the features learned within the hidden representations of autoencoders are input to traditional anomaly detection algorithms such as one-class SVM (OC-SVM) to detect outliers (Andrews et al. [2016a]). Figure 7 illustrates the deep hybrid model architecture used for anomaly detection. Following the success of transfer learning to obtain rich representative features from models pre-trained on large data-sets, hybrid models have also employed these pre-trained transfer learning models as feature extractors with great success (Pan et al. [2010]). A variant of hybrid model was proposed by Ergen et al. [2017] which considers joint training of feature extractor along-with OC-SVM (or SVDD) objective to maximize the detection performance. A notable shortcoming of these hybrid approaches is the lack of trainable objective customized for anomaly detection, hence these models fail to extract rich differential features to detect outliers. In order to overcome this limitation customized objective for anomaly detection such as Deep one-class classification (Ruff et al. [2018a]) and One class neural networks (Chalapathy et al. [2018a]) is introduced.
用于异常检测的深度混合模型主要使用深度神经网络自动编码器作为特征提取器,在自动编码器的隐藏表示中学习的特征被输入到传统的异常检测算法,例如一类SVM (OC-SVM) 来检测异常值 (Andrews等人[2016a])。图7展示了用于异常检测的深度混合模型架构。继迁移学习成功地从在大数据集上预先训练的模型中获得丰富的代表性特征之后,混合模型也成功地将这些预先训练的迁移学习模型用作特征提取器(Pan等人[2010])。Ergen等人[2017]提出了一种混合模型的变型,该模型考虑了特征提取器的联合训练以及OC-SVM(或SVDD)目标,以最大化检测性能。这些混合方法的一个显著缺点是缺乏为异常检测定制的可训练目标,因此这些模型不能提取丰富的差异特征来检测异常值。为了克服这一限制,引入了用于异常检测的定制目标,例如深度一类分类(Ruff等人[2018a])和一类神经网络(Challeveny等人[2018a])。
One class neural network (OC-NN) Chalapathy et al. [2018a] methods are inspired by kernel-based one-class classification which combines the ability of deep networks to extract a progressively rich representation of data with the one-class objective of creating a tight envelope around normal data. The OC-NN approach breaks new ground for the following crucial reason: data representation in the hidden layer is driven by the OC-NN objective and is thus customized for anomaly detection. This is a departure from other approaches which use a hybrid approach of learning deep features using an autoencoder and then feeding the features into a separate anomaly detection method like one-class SVM (OC-SVM). The details of training and evaluation of one class neural networks is discussed in Section 10.4. Another variant of one class neural network architecture Deep Support V ector Data Description (Deep SVDD) (Ruff et al. [2018a]) trains deep neural network to extract common factors of variation by closely mapping the normal data instances to the center of sphere, is shown to produce performance improvements on MNIST (LeCun et al. [2010]) and CIFAR-10 (Krizhevsky and Hinton [2009]) datasets.
Chalapathy等人[2018a]的方法受到基于核的一类分类的启发,该分类结合了深度网络提取逐渐丰富的数据表示的能力和围绕正常数据创建紧密包络的一类目标。基于以下关键原因,该方法开辟了新的领域:隐藏层中的数据表示是由OC-NN目标驱动的,因此可以针对异常检测定制。这与其他方法不同,其他方法使用混合方法,即使用自动编码器学习深层特征,然后将这些特征输入到单独的异常检测方法中,如一类SVM (OC-SVM)。关于训练和评估一类神经网络的详细信息,请参见第10.4节。一类神经网络体系结构的另一种形式深度支持向量数据描述(Deep SVDD)(Ruff等人[2018a])训练深度神经网络,通过将正常数据实例紧密映射到球体中心来提取变化的公共因子,可以证明MNIST(LeCun等[2010])和CIFAR-10(Krizhevsky and Hinton [2009])数据集的性能得到改善。
Anomalies can be broadly classified into three types: point anomalies, contextual anomalies and collective anomalies. Deep anomaly detection (DAD) methods have been shown to detect all three types of anomalies with great success.
异常可以大致分为三种类型:点异常、上下文异常和集体异常。深度异常检测(DAD)方法已被证明能够成功检测所有三种类型的异常。
The majority of work in literature focuses on point anomalies. Point anomalies often represent an irregularity or deviation that happens randomly and may have no particular interpretation. For instance, in Figure 10 a credit card transaction with high expenditure recorded at Monaco restaurant seems a point anomaly since it significantly deviates from the rest of the transactions. Several real world applications, considering point anomaly detection, are reviewed in Section 9.
文献中的大部分工作集中在点异常上。点异常通常表示随机发生的不规则或偏差,可能没有特定的解释。例如,在图10中,摩纳哥餐厅记录的高支出信用卡交易似乎是一个点异常,因为它明显偏离了其他交易。考虑到点异常检测,第9节回顾了几个实际应用。
A contextual anomaly is also known as the conditional anomaly is a data instance that could be considered as anomalous in some specific context (Song et al. [2007]). Contextual anomaly is identified by considering both contextual and behavioural features. The contextual features, normally used are time and space. While the behavioral features may be a pattern of spending money, the occurrence of system log events or any feature used to describe the normal behavior. Figure 9a illustrates the example of a contextual anomaly considering temperature data indicated by a drastic drop just before June; this value is not indicative of a normal value found during this time. Figure 9b illustrates using deep Long Short-Term Memory (LSTM) (Hochreiter and Schmidhuber [1997]) based model to identify anomalous system log events (Du et al. [2017]) in a given context (e.g event 53 is detected as being out of context).
上下文异常(contextual anomaly)也称为条件异常(conditional anomaly),是在特定上下文中可以被视为异常的数据实例(Song等人[2007])。通过考虑上下文和行为特征来识别上下文异常。通常使用的上下文特征是时间和空间。虽然行为特征可能是花钱的模式,但系统日志事件的发生或用于描述正常行为的任何特征。图9a考虑温度数据的上下文异常的例子,该温度数据由六月前的急剧下降表明,该值不表示在此期间发现的正常值。图9b示出了使用基于LSTM(Hochreiter和Schmidhuber [1997])的模型来识别给定上下文中的异常系统日志事件(Du等人[2017]),例如,事件53被检测为脱离上下文。
Anomalous collections of individual data points are known as collective or group anomalies, wherein each of the individual points in isolation appears as normal data instances while observed in a group exhibit unusual characteristics. For example, consider an illustration of a fraudulent credit card transaction, in the log data shown in Figure 10, if a single transaction of ”MISC” would have occurred, it might probably not seem as anomalous. The following group of transactions of valued at $75 certainly seems to be a candidate for collective or group anomaly. Group anomaly detection (GAD) with an emphasis on irregular group distributions (e.g., irregular mixtures of image pixels are detected using a variant of autoencoder model (Chalapathy et al. [2018b], Bontemps et al. [2016], Araya et al. [2016], Zhuang et al. [2017]).
个体数据点(individual data)的异常集合被称为集合或组异常,其中每个孤立的点表现为正常数据实例,而在一个组中观察到的个体点表现出异常特征。例如,考虑一个信用卡欺诈交易的例子,在图10所示的日志数据中,如果发生了一个“MISC”交易,它可能看起来并不异常。以下一组价值75美元的交易显然是集体或集体异常的候选者。以不规则组分布为重点的组异常检测(Group anomaly detection,GAD),例如,使用自动编码器模型的变体来检测图像像素的不规则混合(Chalapathy等人[2018b],Bontemps等人[2016],Araya等人[2016],Zhuang等人[2017])。
A critical aspect for anomaly detection methods is the way in which the anomalies are detected. Generally, the outputs produced by anomaly detection methods are either anomaly score or binary labels.
异常检测方法的一个关键方面是检测异常的方式。通常,异常检测方法产生的输出要么是异常分数,要么是二分类标签。
Anomaly score describes the level of outlierness for each data point. The data instances may be ranked according to anomalous score, and a domain-specific threshold (commonly known as decision score) will be selected by subject matter expert to identify the anomalies. In general, decision scores reveal more information than binary labels. For instance, in Deep SVDD approach the decision score is the measure of the distance of data point from the center of the sphere, the data points which are farther away from the center are considered anomalous (Ruff et al. [2018b]).
异常分数描述了每个数据点的异常程度。可以根据异常分数对数据实例进行排序,由人类专家选择特定的阈值(通常称为决策分数)来识别异常。一般来说,决策分数比二分类标签揭示更多的信息。例如,在深度SVDD方法中,决策得分是数据点离球体中心的距离的度量,离中心较远的数据点被认为是异常的(Ruff等人[2018b])。
Instead of assigning scores, some techniques may assign a category label as normal or anomalous to each data instance. Unsupervised anomaly detection techniques using autoencoders measure the magnitude of the residual vector (i,e reconstruction error) for obtaining anomaly scores, later on, the reconstruction errors are either ranked or thresholded by domain experts to label data instances.
某些技术可能会将类别标签指定为每个数据实例的正常或异常,而不是指定分数。使用自动编码器的无监督异常检测技术测量残差向量(即重建误差)的大小以获得异常分数,随后,由人类专家对重建误差进行排序或根据阈值以标记数据实例。
In this section, we discuss several applications of deep anomaly detection. For each application domain, we discuss the following four aspects:
the notion of an anomaly;nature of the data;challenges associated with detecting anomalies;existing deep anomaly detection techniques.在本节中,我们将讨论深度异常检测的几种应用。对于每个应用领域,我们讨论以下四个方面:
异常的概念;数据的性质;与检测异常相关的挑战;现有的深度异常检测技术。The intrusion detection system (IDS) refers to identifying malicious activity in a computer-related system (Phoha [2002]). IDS may be deployed at single computers known as Host Intrusion Detection (HIDS) to large networks Network Intrusion Detection (NIDS). The classification of deep anomaly detection techniques for intrusion detection is in Figure 11. IDS depending on detection method are classified into signature-based or anomaly based. Using signature-based IDS is not efficient to detect new attacks, for which no specific signature pattern is available, hence anomaly based detection methods are more popular. In this survey, we focus on deep anomaly detection (DAD) methods and architectures employed in intrusion detection.
入侵检测系统(IDS)是指识别计算机相关系统中的恶意活动(Phoha [2002])。入侵检测系统可以部署在单台计算机上,称为主机入侵检测(Host Intrusion Detection,HIDS)到大型网络网络入侵检测(Network Intrusion Detection,NIDS)。入侵检测的深度异常检测技术的分类如图11所示。基于检测方法的入侵检测系统分为基于特征的入侵检测系统和基于异常的入侵检测系统。使用基于签名的入侵检测系统并不能有效地检测新的攻击,因为没有特定的签名模式可用,因此基于异常的检测方法更受欢迎。在这篇综述中,我们重点介绍了入侵检测中采用的深度异常检测方法和模型架构。
Such systems are installed software programs which monitors a single host or computer for malicious activity or policy violations by listening to system calls or events occurring within that host (Vigna and Kruegel [2005]). The system call logs could be generated by programs or by user interaction resulting in logs as shown in Figure 9b. Malicious interactions lead to the execution of these system calls in different sequences. HIDS may also monitor the state of a system, its stored information, in Random Access Memory (RAM), in the file system, log files or elsewhere for a valid sequence. Deep anomaly detection (DAD) techniques applied for HIDS are required to handle the variable length and sequential nature of data. The DAD techniques have to either model the sequence data or compute the similarity between sequences. Some of the success-full DAD techniques for HIDS is illustrated in Table 3.
入侵检测系统是安装的软件程序,通过监听系统调用或主机内发生的事件来监控单个主机或计算机的恶意活动或违反策略的行为(Vigna和Kruegel [2005])。系统调用日志可以由程序生成,也可以由用户交互生成日志,如图9b所示。恶意交互导致这些系统调用以不同的顺序执行。HIDS还可以监控系统的状态、存储在随机存取存储器(RAM)中的信息、文件系统、日志文件或其他地方的有效序列。应用于HIDS的深度异常检测技术需要处理数据的可变长度和顺序性。DAD技术要么对序列数据建模,要么计算序列之间的相似性。表3说明了HIDS的一些成功的DAD技术。
NIDS systems deal with monitoring the entire network for suspicious traffic by examining each and every network packet. Owing to real-time streaming behavior, the nature of data is synonymous to big data with high volume, velocity, variety. The network data also has a temporal aspect associated with it. Some of the success-full DAD techniques for NIDS is illustrated in Table 4 . This survey also lists the data-sets used for evaluating the DAD intrusion detection methods in Table 5. A challenge faced by DAD techniques in intrusion detection is that the nature of anomalies keeps changing over time as the intruders adapt their network attacks to evade the existing intrusion detection solutions.
NIDS系统通过检查每个网络数据包来监控整个网络的可疑流量。由于实时流行为,数据的本质是大数据的同义词,具有高容量、高速度和多样性。网络数据也有与之相关的时间方面。表4说明了NIDS的一些成功的DAD技术。该研究还在表5中列出了用于评估分布式入侵检测方法的数据集。DAD技术在入侵检测中面临的一个挑战是,随着入侵者调整他们的网络攻击以逃避现有的入侵检测解决方案,异常的性质会随着时间的推移而不断变化。
Fraud is a deliberate act of deception to access valuable resources (Abdallah et al. [2016]). The PricewaterhouseCoopers (PwC) global economic crime survey of 2018 (Lavion [2018], Zhao [2013]) found that half of the 7,200 companies they surveyed had experienced fraud of some nature. Fraud detection refers to the detection of unlawful activities across various industries, illustrated in 12.
欺诈是一种获取宝贵资源的蓄意欺骗行为(Abdallah等人[2016])。普华永道(PwC)2018年全球经济犯罪研究(Lavion [2018],Zhao[2013])发现,在他们研究的7200家公司中,有一半曾遭遇某种性质的欺诈。欺诈检测是指检测各行业的非法活动,如图12所示。 Fraud in telecommunications, insurance (health, automobile, etc) claims, banking ( tax return claims, credit card transactions etc) represent significant problems in both governments and private businesses. Detecting and preventing fraud is not a simple task since fraud is an adaptive crime. Many traditional machine learning algorithms have been applied successfully in fraud detection (Sorournejad et al. [2016]). The challenge associated with detecting fraud is that it requires real-time detection and prevention. This section focuses on deep anomaly detection (DAD) techniques for fraud detection.
电信、保险(健康、汽车等)索赔、银行(纳税申报、信用卡交易等)欺诈是政府和私营企业面临的重大问题。检测和防止欺诈不是一项简单的任务,因为欺诈是一种适应性犯罪(adaptive crime)。许多传统的机器学习算法已经成功应用于欺诈检测(Sorournejad等人[2016])。检测欺诈的挑战在于它需要实时检测和预防。本节重点介绍用于欺诈检测的深度异常检测(DAD)技术。
Credit card has become a popular payment method in online shopping for goods and services. Credit card fraud involves theft of a payment card details, and use it as a fraudulent source of funds in a transaction. Many techniques for credit card fraud detection have been presented in the last few years (Zhou et al. [2018], Suganya and Kamalraj [2015]). We will briefly review some of DAD techniques as shown in Table 6. The challenge in credit card fraud detection is that frauds have no consistent patterns. The typical approach in credit card fraud detection is to maintain a usage profile for each user and monitor the user profiles to detect any deviations. Since there are billions of credit card users this technique of user profile approach is not very scalable. Owing to the inherent scalable nature of DAD techniques techniques are gaining broad spread adoption in credit card fraud detection.
信用卡已经成为网上购物商品和服务的一种流行的支付方式。信用卡欺诈包括窃取支付卡的详细信息,并获取欺诈资金。近年来,出现了许多信用卡欺诈检测技术(Zhou等 [2018], Suganya and Kamalraj [2015]。我们将简要回顾一些如表6所示的DAD技术。信用卡欺诈检测的挑战是欺诈没有一致的模式。信用卡欺诈检测的典型方法是维护每个用户的使用概况,并监控用户概况以检测任何偏差。由于有数十亿的信用卡用户,基于用户画像的方法难以扩展。由于DAD技术固有的可扩展性,该技术在信用卡欺诈检测中得到了广泛的采用。
In recent times, mobile cellular networks have witnessed rapid deployment and evolution supporting billions of users and a vastly diverse array of mobile devices. Due to this broad adoption and low mobile cellular service rates, mobile cellular networks is now faced with frauds such as voice scams targeted to steal customer private information, and messaging related scams to extort money from customers. Detecting such fraud is of paramount interest and not an easy task due to volume and velocity of the mobile cellular network. Traditional machine learning methods with static feature engineering techniques fail to adapt to the nature of evolving fraud. Table 7 lists DAD techniques for mobile cellular network fraud detection.
近年来,移动蜂窝网络经历了快速部署和发展,支持数十亿用户和种类繁多的移动设备。由于这种广泛采用和低移动蜂窝服务率,移动蜂窝网络现在面临着欺诈,例如旨在窃取客户私人信息的语音欺诈,以及与消息传递相关的向客户勒索金钱的欺诈。由于移动蜂窝网络的容量和速度,检测这种欺诈是最重要的,也不是一件容易的事情。采用静态特征工程技术的传统机器学习方法不能适应不断发展的欺诈的本质。表7列出了用于移动蜂窝网络欺诈检测的DAD技术。
Several traditional machine learning methods have been applied successfully to detect fraud in insurance claims (Joudaki et al. [2015], Roy and George [2017]). The traditional approach for fraud detection is based on features which are fraud indicators. The challenge with these traditional approaches is that the need for manual expertise to extract robust features. Another challenge is insurance fraud detection is the that the incidence of frauds is far less than the total number of claims, and also each fraud is unique in its way. In order to overcome these limitations several DAD techniques are proposed which are illustrated in Table 8.
几种传统的机器学习方法已经成功地应用于检测保险索赔中的欺诈(Joudaki等人[2015],Roy和George [2017])。传统的欺诈检测方法是基于欺诈指标(fraud indicators)的特征。这些传统方法面临的挑战是需要人工专业知识来提取健壮的特征。另一个挑战是保险欺诈检测,即欺诈的发生率远远低于索赔的总数,而且每种欺诈都有其独特的方式。为了克服这些限制,提出了几项直接数据采集技术,如表8所示。
Healthcare is an integral component in people’s lives, waste, abuse, and fraud drive up costs in healthcare by tens of billions of dollars each year. Healthcare insurance claims fraud is a significant contributor to increased healthcare costs, but its impact can be mitigated through fraud detection. Several machine learning models have been used effectively in health care insurance fraud (Bauder and Khoshgoftaar [2017]). Table 9 presents an overview of DAD methods for health-care fraud identification.
医疗保健是人们生活中不可或缺的组成部分,浪费、滥用和欺诈每年都会使医疗保健成本增加数百亿美元。医疗保险索赔欺诈是医疗成本增加的一个重要因素,但其影响可以通过欺诈检测来减轻。几种机器学习模型已经有效地用于医疗保险欺诈(Bauder和Khoshgoftar[2017])。表9给出了用于医疗保健欺诈识别的DAD方法的概述。
Malware, short for Malicious Software. In order to protect legitimate users from malware, machine learning based efficient malware detection methods are proposed (Ye et al. [2017]). In classical machine learning methods, the process of malware detection is usually divided into two stages: feature extraction and classification/clustering. The performance of traditional malware detection approaches critically depend on the extracted features and the methods for classification/clustering. The challenge associated in malware detection problems is the sheer scale of data, for instance considering data as bytes a specific sequence classification problem could be of the order of two million time steps. Furthermore, the malware is very adaptive in nature, wherein the attackers would use advanced techniques to hide the malicious behavior. Some DAD techniques which address these challenges effectively and detect malware are shown in Table 10.
为了保护合法用户免受恶意软件的攻击,提出了基于机器学习的高效恶意软件(Malicious Software,Malware)检测方法(Ye等[2017])。在经典的机器学习方法中,恶意软件检测的过程通常分为两个阶段:特征提取和分类/聚类。传统恶意软件检测方法的性能主要取决于提取的特征和分类/聚类方法。恶意软件检测的挑战是数据的规模,例如,将数据视为字节,特定的序列分类问题可能需要大约200万个时间步骤。此外,恶意软件本质上是适应性很强的,攻击者会使用高级技术来隐藏恶意行为。表10显示了一些可有效应对这些挑战并检测恶意软件的数据分析技术。
Several studies have been conducted to understand the theoretical and practical applications of deep learning in medical and bio-informatics (Min et al. [2017], Cao et al. [2018a], Zhao et al. [2016], Khan and Y airi [2018]). Finding rare events (anomalies) in areas such as medical image analysis, clinical electroencephalography (EEG) records, enable to diagnose and provide preventive treatments for a variety of medical conditions. Deep learning based architectures are employed with great success to detect medical anomalies as illustrated in Table 11. The vast amount of imbalanced data in medical domain presents significant challenges to detect outliers. Additionally deep learning techniques for long have been considered as black-box techniques. Even though deep learning models produce outstanding performance, these models lack interpret-ability. In recent times models with good interpret-ability are proposed and shown to produce state-of-the-art performance (Gugulothu et al., Amarasinghe et al. [2018b], Choi [2018]).
为了了解深度学习在医学和生物信息学中的理论和实际应用,已经进行了多项研究(Min等[2017],Cao等[2018a],Zhao等[2016],Khan和Yairi [2018])。在医学图像分析、临床脑电图记录等领域发现罕见事件(异常),有助于对各种医疗状况进行诊断并提供预防性治疗。如表11所示,基于深度学习的体系结构被成功地用于检测医学异常。医学领域中大量的不平衡数据给异常值的检测带来了巨大的挑战。此外,长期以来,深度学习技术一直被认为是黑盒技术。尽管深度学习模型产生了出色的性能,但这些模型缺乏解释能力。近年来,具有良好解释能力的模型被提出并显示出产生最先进的性能(Gugulothu等人,Amarasinghe等人[2018b],Choi [2018])。
In recent times, online social networks have become part and parcel of daily life. Anomalies in a social network are irregular often unlawful behavior pattern of individuals within a social network; such individuals may be identified as spammers, sexual predators, online fraudsters, fake users or rumor-mongers. Detecting these irregular patterns is of prime importance since if not detected, the act of such individuals can have a serious social impact. A survey of traditional anomaly detection techniques and its challenges to detect anomalies in social networks is a well studied topic in literature (Liu and Chawla [2017], Savage et al. [2014], Anand et al. [2017], Yu et al. [2016], Cao et al. [2018b], Yu et al. [2016]). The heterogeneous and dynamic nature of data presents significant challenges to DAD techniques. Despite these challenges, several DAD techniques illustrated in Table 12 are shown outperform state-ofthe-art methods.
最近,在线社交网络已经成为日常生活的一部分。社交网络中的异常是社交网络中个人的不规则且通常是非法的行为模式;这些人可能被认定为垃圾邮件发送者、性侵犯者、在线诈骗者、假冒用户或造谣者。发现这些不规则的模式至关重要,因为如果不被发现,这些人的行为会产生严重的社会影响。传统的异常检测技术及其在社交网络中检测异常的挑战的调查是文献研究的热门话题(Liu和Chawla [2017],Savage等人[2014],Anand等人[2017],Yu等人[2016],Cao等人[2018b],Yu等人[2016])。数据的异构性和动态性对DAD技术提出了重大挑战。尽管有这些挑战,表12中所示的几种DAD技术的表现优于最先进的方法。
Anomaly detection in log file aims to find text, which can indicate the reasons and the nature of the failure of a system. Most commonly, a domain-specific regular-expression is constructed from past experience which finds new faults by pattern matching. The limitation of such approaches is that newer messages of failures are easily are not detected (Memon [2008]). The unstructured and diversity in both format and semantics of log data pose significant challenges to log anomaly detection. Anomaly detection techniques should adapt to the concurrent set of log data generated and detect outliers in real time. Following the success of deep neural networks in real time text analysis, several DAD techniques illustrated in Table 13 model the log data as a natural language sequence are shown very effective in detecting outliers.
日志文件中的异常检测旨在找到能够表明系统故障原因和性质的文本。最常见的是,特定于领域的正则表达式是根据过去的经验构建的,它通过模式匹配发现新的错误。这种方法的局限性是较新的故障信息很容易检测不到(Memon [2008])。日志数据在格式和语义上的非结构化和多样性给日志异常检测带来了巨大的挑战。异常检测技术应适应生成的并发日志数据集,并实时检测异常值。随着深度神经网络在实时文本分析中的成功,表13中所示的几种DAD技术将日志数据建模为自然语言序列,在检测异常值方面非常有效。
IoT is identified as a network of devices that are interconnected with soft-wares, servers, sensors and etc. In the field of the Internet of things (IoT), data generated by weather stations, Radio-frequency identification (RFID) tags, IT infrastructure components, and some other sensors are mostly time-series sequential data. Anomaly detection in these IoT networks identifies fraudulent, faulty behavior of these massive scales of interconnected devices. The challenges associated with outlier detection is that heterogeneous devices are interconnected which renders the system more complex. A thorough overview of using deep learning (DL), to facilitate analytics and learning in the IoT domain is presented by (Mohammadi et al. [2018]). Table 14 illustrates the DAD techniques employed IoT devices.
物联网被认为是一个由软件、服务器、传感器等相互连接的设备网络。在物联网领域,气象站、射频识别标签、信息技术基础设施组件和其他一些传感器生成的数据大多是时间序列序列数据。这物联网中的异常检测可识别这些大规模互连设备的欺诈行为。该异常值检测的挑战是异构设备之间的互连使系统更加复杂。Mohammadi 等人[2018] 对使用深度学习来促进物联网领域的分析和学习进行了全面概述。表14说明了物联网设备中采用的DAD技术。
Industrial systems consisting of wind turbines, power plants, high-temperature energy systems, storage devices and with rotating mechanical parts are exposed to enormous stress on a day-to-day basis. Damage to these type of systems not only causes economic loss but also a loss of reputation, therefore detecting and repairing them early is of utmost importance. Several machine learning techniques have been used to detect such damage in industrial systems (Ramotsoela et al. [2018], Mart´ ı et al. [2015]). Several papers published utilizing deep learning models for detecting early industrial damage show great promise (Atha and Jahanshahi [2018], de Deijn [2018], Wang et al. [2018c]). Damages caused to equipment are rare events, thus detecting such events can be formulated as an outlier detection problem. The challenges associated with outlier detection in this domain is both volumes as well as the dynamic nature of data since failure is caused due to a variety of factors. Some of the DAD techniques employed across various industries are illustrated in Table 15.
由风力涡轮机、发电厂、高温能源系统、存储设备和旋转机械部件组成的工业系统每天都面临巨大的压力。对这类系统的损坏不仅会造成经济损失,还会造成声誉损失,因此尽早检测和修复它们至关重要。几种机器学习技术已被用于检测工业系统中的这种损坏(Ramotsoela等人[2018],Marti等人[2015])。利用深度学习模型检测早期工业损伤的几篇论文显示出很大的前景(Atha和Jahanshahi [2018],de Deijn [2018],Wang等人[2018c])。对设备造成的损坏是罕见的事件,因此检测此类事件可以表述为异常检测问题。由于故障是由多种因素造成的,因此在该领域与异常值检测相关的挑战既是数据量也是数据的动态特性。表15说明了不同行业采用的一些DAD技术。
Data recorded continuously over duration is known as time series. Time series data can be broadly classified into univariate and multivariate time series. In case of univariate time series, only single variable (or feature) varies over time. For instance, the data collected from a temperature sensor within the room for each second is an uni-variate time series data. A multivariate time series consists several variables (or features) which change over time. An accelerometer which produces three-dimensional data for every second one for each axis (x, y, z) is a perfect example of multivariate time series data. In the literature, types of anomalies in univariate and multivariate time series are categorized into following groups: (1) Point Anomalies. 8.4.1 (2) Contextual Anomalies 8.4.2 (3) Collective Anomalies 8.4.3. In recent times, many deep learning models have been proposed for detecting anomalies within univariate and multivariate time series data as illustrated in Table 16 and Table 17 respectively. Some of the challenges to detect anomalies in time series using deep learning models data are:
Lack of defined pattern in which an anomaly is occurring may be defined.Noise within the input data seriously affects the performance of algorithms.As the length of the time series data increases the computational complexity also increases.Time series data is usually non-stationary, non-linear and dynamically evolving. Hence DAD models should be able to detect anomalies in real time.持续记录的数据称为时间序列。时间序列数据可以大致分为单变量和多变量时间序列。在单变量时间序列的情况下,只有单个变量(或特征)随时间变化。例如,从房间内的温度传感器每秒收集的数据是单变量时间序列数据。多变量时间序列由几个随时间变化的变量(或特征)组成。加速度计每秒钟为每个轴(x,y,z)生成一个三维数据,是多变量时间序列数据的完美例子。在文献中,单变量和多变量时间序列中的异常类型分为以下几类:(1)点异常(8.4.1);(2)上下文异常(8.4.2);(3)集体异常(8.4.3)。近年来,已经提出了许多深度学习模型来检测单变量和多变量时间序列数据中的异常,分别如表16和表17所示。使用深度学习模型数据检测时间序列异常的一些挑战是:
缺乏已定义的模式,在这种模式下可能会出现异常。输入数据中的噪声严重影响算法的性能。随着时间序列数据长度的增加,计算复杂性也会增加。时间序列数据通常是非平稳、非线性和动态演变的。因此,DAD模型应该能够实时检测异常。The advancements in deep learning domain offer opportunities to extract rich hierarchical features which can greatly improve outlier detection within uni-variate time series data. The list of industry standard tools and datasets (both deep learning based and non-deep learning based) for benchmarking anomaly detection algorithms on both univariate and multivariate time-series data is presented and maintained at Github repository2. Table 16 illustrates various deep architectures adopted for anomaly detection within uni-variate time series data.
深度学习领域的进步为提取丰富的层次特征提供了机会,这可以极大地改善单变量时间序列数据中的离群点检测。该Github仓库提供并维护了一元和多元时间序列数据基准异常检测算法的行业标准工具和数据集列表(基于深度学习和非深度学习)。表16说明了用于单变量时间序列数据中异常检测的各种深度架构。
Anomaly detection in multivariate time series data is a challenging task. Effective multivariate anomaly detection enables fault isolation diagnostics. RNN and LSTM based methods are shown to perform well in detecting interpretable anomalies within multivariate time series dataset. DeepAD, a generic framework based on deep learning for multivariate time series anomaly detection is proposed by (Buda et al. [2018]). Interpretable, anomaly detection systems designed using deep attention based models are effective in explaining the anomalies detected (Yuan et al. [2018b], Guo and Lin [2018]). Table 17 illustrates various deep architectures adopted for anomaly detection within multivariate time series data.
多元时间序列数据中的异常检测是一项具有挑战性的任务。有效的多元异常检测支持故障隔离诊断(fault isolation diagnostics)。基于RNN和LSTM的方法在检测多变量时间序列数据集中的可解释异常方面表现良好。DeepAD是Buda等人[2018]提出的基于深度学习的多元时间序列异常检测通用框架。使用基于深度注意的模型设计的可解释的异常检测系统在解释检测到的异常方面是有效的(Yuan 等[2018b],Guo and Lin [2018])。表17说明了多变量时间序列数据中用于异常检测的各种深度架构。
Video Surveillance also popularly known as Closed-circuit television (CCTV) involves monitoring designated areas of interest in order to ensure security. In videos surveillance applications unlabelled data is available in large amounts, this is a significant challenge for supervised machine learning and deep learning methods. Hence video surveillance applications have been modeled as anomaly detection problems owing to lack of availability of labeled data. Several works have studied the state-of-the-art deep models for video anomaly detection and have classified them based on the type of model and criteria of detection (Kiran et al. [2018], Chong and Tay [2015]). The challenges of robust 24/7 video surveillance systems are discussed in detail by (Boghossian and Black [2005]). The lack of an explicit definition of an anomaly in real-life video surveillance is a significant issue that hampers the performance of DAD methods as well. DAD techniques used in video surveillance are illustrated in Table 19.
视频监控,也就是众所周知的闭路电视,包括监控指定的感兴趣区域,以确保安全。在视频监控应用中,大量未标记的数据是可用的,这对有监督的机器学习和深度学习方法是一个重大挑战。因此,视频监控应用已经被建模为异常检测问题,因为缺乏可用的标记数据。一些论文研究了用于视频异常检测的最先进的深度模型,并根据模型类型和检测标准对它们进行了分类(Kiran等人[2018],Chong和Tay [2015])。(Boghossian and Black [2005])详细讨论了鲁棒的24/7视频监控系统的挑战。在现实生活的视频监控中,缺乏对异常的明确定义是一个重要的问题,它也阻碍了DAD方法的性能。视频监控中使用的数据采集技术如表19所示。
In this section, we discuss various DAD models classified based on the availability of labels and training objective. For each model types domain, we discuss the following four aspects:
assumptions;type of model architectures;computational complexity;advantages and disadvantages;在本节中,我们将讨论基于标签可用性和培训目标分类的各种DAD模型。对于每个模型类型领域,我们讨论以下四个方面:
假设;模型体系结构的类型;计算复杂性;优点和缺点;Supervised anomaly detection techniques are superior in performance compared to unsupervised anomaly detection techniques since these techniques use labeled samples (Görnitz et al. [2013]). Supervised anomaly detection learns the separating boundary from a set of annotated data instances (training) and then, classify a test instance into either normal or anomalous classes with the learned model (testing).
与无监督异常检测技术相比,有监督异常检测技术在性能上更优越,因为这些技术使用标记样本(Görnitz等人[2013])。监督异常检测从一组带注释的数据实例(训练)中学习分离边界,然后用学习的模型将测试实例分类为正常或异常类(测试)。
Assumptions: Deep supervised learning methods depend on separating data classes whereas unsupervised techniques focus on explaining and understanding the characteristics of data. Multi-class classification based anomaly detection techniques assumes that the training data contains labeled instances of multiple normal classes (Shilton et al. [2013], Jumutc and Suykens [2014], Kim et al. [2015], Erfani et al. [2017]). Multi-class anomaly detection techniques learn a classifier to distinguish between anomalous class from the rest of the classes. In general, supervised deep learningbased classification schemes for anomaly detection have two sub-networks, a feature extraction network followed by a classifiernetwork.
假设:深度监督学习方法依赖于分离数据类,而无监督技术则侧重于解释和理解数据的特征。基于多类别分类的异常检测技术假设训练数据包含多个正常类别的标记实例(Shilton等人[2013],Jumutc和Suykens [2014],Kim等人[2015],Erfani等人[2017] ])。多类异常检测技术学习分类器,以区分异常类和其余类。通常,基于监督的深度学习的异常检测分类方案有两个子网,一个特征提取网络,后跟一个分类器网络。深度模型需要大量的训练样本(以千或百万为单位),以学习特征表示以有效地区分各种类实例。由于缺乏干净的数据标签,有监督的深度异常检测技术并不像半监督和无监督方法那样流行。
Computational Complexity: The computational complexity of deep supervised anomaly detection methods based techniques depends on the input data dimension and the number of hidden layers trained using back-propagation algorithm. High dimensional data tend to have more hidden layers to ensure meaning-full hierarchical learning of input features.The computational complexity also increases linearly with the number of hidden layers and require greater model training and update time.
计算复杂度:基于深度监督异常检测方法的技术的计算复杂度取决于输入数据维数和使用反向传播算法训练的隐藏层数量。高维数据倾向于具有更多的隐藏层,以确保对输入特征进行完全意义的分层学习。计算复杂度也随着隐藏层的数量线性增加,并且需要更多的模型训练和更新时间。
Advantages and Disadvantages:
The advantages of supervised DAD techniques are as follows:
Supervised DAD methods are more accurate than semi-supervised and unsupervised models.
The testing phase of classification based techniques is fast since each test instance needs to be compared against the precomputed model.
有监督的DAD方法比半监督和无监督模型更精确。
基于分类的技术的测试阶段很快,因为每个测试实例都需要与预先计算的模型进行比较。
The disadvantages of Supervised DAD techniques are as follows: Multi-class supervised techniques require accurate labels for various normal classes and anomalous instances, which is often not available.
Deep supervised techniques fail to separate normal from anomalous data if the feature space is highly complex and non-linear.
多类监督技术需要各种正常类和异常类的准确标签,这通常是不可用的。
如果特征空间高度复杂,并且是非线性的,那么深度监督技术无法将正常数据与异常数据分开。
Semi-supervised or (one-class classification) DAD techniques assume that all training instances have only one class label. A review of deep learning based semi-supervised techniques for anomaly detection is presented by Kiran et al. [2018] and Min et al. [2018]. DAD techniques learn a discriminative boundary around the normal instances. The test instance that does not belong to the majority class is flagged as being anomalous (Perera and Patel [2018], Blanchard et al. [2010]). Various semi-supervised DAD model architectures are illustrated in Table 20.
半监督或(one-class分类)DAD技术假设所有训练实例只有一类标签。Kiran等人[2018]和Min等人[2018]对基于深度学习的半监督异常检测技术进行了综述。DAD术在正常情况下学习一个有区别的边界,不属于多数类的测试实例被标记为异常(Perera和Patel [2018],Blanchard等人[2010])。各种半监督的DAD模型架构如表20所示。 Assumptions: Semi-supervised DAD methods proposed to rely on one of the following assumptions to score a data instance as an anomaly. 假设:提出的半监督DAD方法依赖于以下假设之一,将数据实例评分为异常。
Proximity and Continuity: Points which are close to each other both in input space and learned feature space are more likely to share the same label.Robust features are learned within hidden layers of deep neural network layers and retain the discriminative attributes for separating normal from outlier data points.邻近性和连续性:在输入空间和学习特征空间中彼此接近的点更有可能共享相同的标签。鲁棒特征在深层神经网络层的隐藏层中学习,并保留区分属性,用于将正常数据点与异常数据点分开。Computational Complexity: The computational complexity of semi-supervised DAD methods based techniques is similar to supervised DAD techniques, which primarily depends on the dimensionality of the input data and the number of hidden layers used for representative feature learning.
计算复杂性:基于半监督的DAD方法的计算复杂性类似于监督DAD技术,它主要取决于输入数据的维数和用于特征学习的隐藏层的数量。
Deep learning models are widely used as feature extractors to learn robust features (Andrews et al. [2016a]). In deep hybrid models, the representative features learned within deep models are input to traditional algorithms like oneclass Radial Basis Function (RBF), Support V ector Machine (SVM) classifiers. The hybrid models employ two step learning and are shown to produce state-of-the-art results (Erfani et al. [2016a,b], Wu et al. [2015b]). Deep hybrid architectures used in anomaly detection is presented in Table 21.
深度学习模型被广泛用作特征提取器来学习鲁棒特征(Andrews等人[2016a])。在深度混合模型中,在深度模型中学习的代表性特征被输入到传统算法中,如单类径向基函数、支持向量机分类器。混合模型采用两步学习,并显示出产生最先进的结果(Erfani等人[2016a,b],Wu等人[2015b])。表21显示了用于异常检测的深度混合架构。 Assumptions: The deep hybrid models proposed for anomaly detection rely on one of the following assumptions to detect outliers:
假设:提出用于异常检测的深度混合模型依赖于以下假设之一来检测异常值:
Robust features are extracted within hidden layers of the deep neural network, aid in separating the irrelevant features which can conceal the presence of anomalies.Building a robust anomaly detection model on complex, high-dimensional spaces require feature extractor and an anomaly detector. V arious anomaly detectors used alongwith are illustrated in Table 21.在深层神经网络的隐藏层中提取鲁棒特征,有助于分离可能隐藏异常的不相关特征。在复杂的高维空间上建立一个健壮的异常检测模型需要特征提取器和异常检测器。表21中说明了与之配合使用的各种异常检测器。Computational Complexity : The computational complexity of a hybrid model includes the complexity of both deep architectures as well as traditional algorithms used within. Additionally, an inherent issue of non-trivial choice of deep network architecture and parameters which involves searching optimized parameters in a considerably larger space introduces the computational complexity of using deep layers within hybrid models. Furthermore considering the classical algorithms such as linear SVM which has prediction complexity of O(d) with d the number of input dimensions. For most kernels, including polynomial and RBF, the complexity is O(nd) where n is the number of support vectors although an approximation O(d2) is considered for SVMs with an RBF kernel.
计算复杂性:混合模型的计算复杂性包括深度架构以及其中使用的传统算法的复杂性。此外,深度网络架构和参数的非平凡选择的固有问题涉及在相当大的空间中搜索优化参数,这引入了在混合模型中使用深层的计算复杂性。进一步考虑到经典算法如线性SVM,其预测复杂度为O(d),输入维数为d。对于大多数核,包括多项式核和径向基函数核,复杂度是O(nd),其中n是支持向量的数量,尽管对于具有径向基函数核的支持向量机,考虑了近似O(d2)。
Advantages and Disadvantages
The advantages of hybrid DAD techniques are as follows:
The feature extractor significantly reduces the ‘curse of dimensionality’, especially in the high dimensional domain.Hybrid models are more scalable and computationally efficient since the linear or nonlinear kernel models operate on reduced input dimension.特征提取器显著减少了“维数灾难”,尤其是在高维空间中。混合模型的可扩展性和计算效率更高,因为线性或非线性内核模型的输入维数更低。The significant disadvantages of hybrid DAD techniques are:
The hybrid approach is suboptimal because it is unable to influence representational learning within the hidden layers of feature extractor since generic loss functions are employed instead of the customized objective for anomaly detection.The deeper hybrid models tend to perform better if the individual layers are (Saxe et al. [2011]) which introduces computational expenditure.混合方法是次优的,因为它不能影响特征提取器隐藏层内的表示学习,因为采用了通用损失函数而不是用于异常检测的定制目标函数。如果单个层(Saxe等人[2011])引入了计算开销,则更深层次的混合模型往往表现更好。One-class neural networks (OC-NN) combines the ability of deep networks to extract a progressively rich representation of data alongwith the one-class objective, such as a hyperplane (Chalapathy et al. [2018a]) or hypersphere (Ruff et al. [2018a]) to separate all the normal data points from the outliers. The OC-NN approach is novel for the following crucial reason: data representation in the hidden layer are learned by optimizing the objective function customized for anomaly detection as illustrated in The experimental results in (Chalapathy et al. [2018a], Ruff et al. [2018a]) demonstrate that OC-NN can achieve comparable or better performance than existing state-of-the-art methods for complex datasets, while having reasonable training and testing time compared to the existing methods.
一类神经网络(OC-NN)结合了深层网络的能力,以提取与单类目标一起的数据的逐渐丰富的表示,例如超平面(Chalapathy等人[2018a])或超球面(Ruff等人[2018a]),以将所有正常数据点与异常值分开。OC-NN方法是新颖的,原因是:隐藏层中的数据表示是通过优化为异常检测定制的目标函数来学习的,如(Chalapathy等人[2018a],Ruff等人[2018a])中的实验结果所示。实验结果表明,对于复杂数据集,OC-NN可以实现与现有最先进方法相当或更好的性能,同时与现有方法相比,具有合理的训练和测试时间。
Assumptions: The OC-NN models proposed for anomaly detection rely on the following assumptions to detect outliers:
OC-NN models extract the common factors of variation within the data distribution within the hidden layers of the deep neural network.Performs combined representation learning and produces an outlier score for a test data instance.Anomalous samples do not contain common factors of variation and hence hidden layers fail to capture the representations of outliers. 假设:用于异常检测的OC-NN模型依赖于以下假设来检测异常值:OC-NN模型提取深层神经网络的隐藏层内的数据分布中的共同变化因素。执行组合表示学习,并为测试数据实例生成异常值分数。异常样本不包含共同的变化因素,因此隐藏层无法捕捉异常值的表示。Computational Complexity: The Computational complexity of an OC-NN model as against the hybrid model includes only the complexity of the deep network of choice (Saxe et al. [2011]). OC-NN models do not require data to be stored for prediction, thus have very low memory complexity. However, it is evident that the OC-NN training time is proportional to the input dimension.
计算复杂性:与混合模型相比,OC-NN 模型的计算复杂性仅包括选择的深层网络的复杂性(Saxe等人[2011])。神经网络模型不需要存储预测数据,因此具有非常低的内存复杂性。然而,显而易见的是,训练时间与输入维数成正比。
Advantages and Disadvantages: The advantages of OC-NN are as follows:
OC-NN models jointly train a deep neural network while optimizing a data-enclosing hypersphere or hyperplane in output space.OC-NN propose an alternating minimization algorithm for learning the parameters of the OC-NN model. We observe that the subproblem of the OC-NN objective is equivalent to a solving a quantile selection problem which is well defined.OC-NN的优点如下:
OC-NN模型在优化输出空间中的数据封闭超球或超平面的同时,联合训练深度神经网络。提出一种交替最小化算法,用于学习神经网络模型的参数。我们观察到OC-NN目标的子问题等价于求解一个定义明确的分位数选择问题。The significant disadvantages of OC-NN for anomaly detection are:
Training times and model update time may be longer for high dimensional input data.Model updates would also take longer time, given the change in input space.用于异常检测的OC-NN模型的显著缺点是:
对于高维输入数据,训练时间和模型更新时间可能更长。鉴于输入空间的变化,模型更新也需要更长的时间。Unsupervised DAD is an essential area of research in both fundamental machine learning research and industrial applications. Several deep learning frameworks that address challenges in unsupervised anomaly detection are proposed and shown to produce a state-of-the-art performance as illustrated in Table 22. Autoencoders are the fundamental unsupervised deep architectures used in anomaly detection (Baldi [2012]).
无监督机器学习是基础机器学习研究和工业应用中的一个重要研究领域。提出了几个深度学习框架来解决无监督异常检测中的挑战,并显示出如表22所示的最先进的性能。自动编码器是用于异常检测的基本无监督深度架构(Baldi [2012])。 Assumptions: The deep unsupervised models proposed for anomaly detection rely on one of the following assumptions to detect outliers:
The “normal” regions in the original or latent feature space can be distinguished from ”anomalous” regions in the original or latent feature space.The majority of the data instances are normal compared to the remainder of the data set.Unsupervised anomaly detection algorithm produces an outlier score of the data instances based on intrinsic properties of the data-set such as distances or densities. The hidden layers of deep neural network aim to capture these intrinsic properties within the dataset (Goldstein and Uchida [2016]).假设用于异常检测的深度无监督模型依赖于以下假设之一来检测异常值:
原始或潜在特征空间中的“正常”区域可以与原始或潜在特征空间中的“异常”区域区分开来。与数据集的其余部分相比,大多数数据实例是正常的。无监督异常检测算法根据数据集的内在属性(如距离或密度)生成数据实例的异常值分数。深层神经网络的隐藏层旨在捕捉数据集内的这些内在属性(Goldstein和Uchida [2016])。Computational Complexity: The autoencoders are the most common architecture employed in outlier detection with quadratic cost, the optimization problem is non-convex, similar to any other neural network architecture. The computational complexity of model depends on the number of operations, network parameters, and hidden layers. However, the computational complexity of training an autoencoder is much higher than traditional methods such as Principal Component Analysis (PCA) since PCA is based on matrix decomposition (Meng et al. [2018], Parchami et al. [2017]).
计算复杂性:自动编码器是最常见的离群点检测结构,具有二次成本,优化问题是非凸的,类似于任何其他神经网络结构。模型的计算复杂度取决于操作数量、网络参数和隐藏层。然而,由于主成分分析是基于矩阵分解的,所以训练自动编码器的计算复杂度远高于主成分分析等传统方法(Meng等人[2018],Parchami等人[2017])。
Advantages and Disadvantages:
The advantages of unsupervised deep anomaly detection techniques are as follows:
Learns the inherent data characteristics to separate normal from an anomalous data point. This technique identifies commonalities within the data and facilitates outlier detection.Cost effective technique to find the anomalies since it does not require annotated data for training the algorithms.了解固有的数据特征,以将正态分布数据与异常数据点分开。该技术可识别数据中的共性并有助于异常检测。查找异常高效,因为它不需要用于训练算法的注释数据。The significant disadvantages of unsupervised deep anomaly detection techniques are:
Often it is challenging to learn commonalities within data in a complex and high dimensional space.While using autoencoders the choice of right degree of compression, i.e., dimensionality reduction is often an hyper-parameter that requires tuning for optimal results.Unsupervised techniques techniques are very sensitive to noise, and data corruptions and are often less accurate than supervised or semi-supervised techniques.无监督深度异常检测技术的显著缺点是:
在复杂的高维空间中学习数据的共性通常很有挑战性。使用自动编码器时,选择合适的压缩程度,即降维,通常是一个超参数,需要调整以获得最佳结果。无监督技术对噪声和数据损坏非常敏感,通常不如有监督或半监督技术准确。This section explores, various DAD techniques which are shown to be effective and promising, we discuss the key idea behind those techniques and their area of applicability.
这一部分探讨了各种被证明是有效的和有前途的DAD技术,讨论了技术背后的关键思想和它们的应用领域。
Deep learning for long has been criticized for the need to have enough data to produce good results. Both Litjens et al. [2017] and Pan et al. [2010] present the review of deep transfer learning approaches and illustrate their significance to learn good feature representations. Transfer learning is an essential tool in machine learning to solve the fundamental problem of insufficient training data. It aims to transfer the knowledge from the source domain to the target domain by relaxing the assumption that training and future data must be in the same feature space and have the same distribution. Deep transfer representation-learning has been explored by (Andrews et al. [2016b], Vercruyssen et al. [2017], Li et al. [2012], Almajai et al. [2012], Kumar and Vaidehi [2017], Liang et al. [2018]) are shown to produce very promising results. The open research questions using transfer learning for anomaly detection is, the degree of transfer-ability, that is to define how well features transfer the knowledge and improve the classification performance from one task to another.
长期以来,深度学习一直被批评为需要有足够的数据才能产生良好的结果。Litjens等人[2017]和Pan等人[2010]介绍了深度迁移学习方法的综述,并说明了它们对学习好的特征表示的意义。迁移学习是机器学习中解决训练数据不足这一根本问题的重要工具。它旨在通过放松训练和未来数据必须在同一特征空间并具有相同分布的假设,将知识从源域转移到目标域。深度迁移表征学习已被(Andrews等人[2016b]、Vercruyssen等人[2017]、李等人[2012]、Almajai等人[2012]、Kumar和Vaidehi [2017]、Liang等人[2018])探索并显示出非常有前景的结果。利用迁移学习进行异常检测的开放性研究问题是迁移能力的程度,即定义特征如何很好地将知识从一个任务转移到另一个任务并提高分类性能。
Zero shot learning (ZSL) aims to recognize objects never seen before within training set (Romera-Paredes and Torr [2015]). ZSL achieves this in two phases: Firstly the knowledge about the objects in natural language descriptions or attributes (commonly known as meta-data) is captured Secondly this knowledge is then used to classify instances among a new set of classes. This setting is important in the real world since one may not be able to obtain images of all the possible classes at training. The primary challenge associated with this approach is the obtaining the meta-data about the data instances. However several approaches of using ZSL in anomaly and novelty detection are shown to produce state-of-the-art results (Mishra et al. [2017], Socher et al. [2013], Xian et al. [2017], Liu et al. [2017], Rivero et al. [2017]).
零次学习(Zero shot learning,ZSL)旨在识别训练集内从未见过的物体(Romera-Paredes和Torr [2015])。ZSL分两个阶段来实现这一点:首先,获取自然语言描述或属性(通常称为元数据(meta-data))中关于对象的知识;其次,使用这些知识在一组新的类中对实例进行分类。这个设置在现实世界中很重要,因为人们可能无法在训练中获得所有可能的图像。这种方法的主要挑战是获取关于数据实例的元数据。在异常和新颖性检测中使用的几种方法显示出产生最先进的结果(Mishra等人[2017]、Socher等人[2013]、冼等人[2017]、刘等人[2017]、Rivero等人[2017])。
A notable issue with deep neural networks is that they are sensitive to noise within input data and often require extensive training data to perform robustly (Kim et al. [2016]). In order to achieve robustness even in noisy data an idea to randomly vary on the connectivity architecture of the autoencoder is shown to obtain significantly better performance. Autoencoder ensembles consisting of various randomly connected autoencoders are experimented by Chen et al. [2017] to achieve promising results on several benchmark datasets. The ensemble approaches are still an active area of research which has been shown to produce improved diversity, thus avoid overfitting problem while reducing training time.
深度神经网络的一个值得注意的问题是,它们对输入数据中的噪声敏感,并且通常需要大量的训练数据来提高鲁棒性(Kim等人[2016])。为了实现鲁棒性,即使在有噪声的数据中,随机改变自动编码器的连接架构的想法被显示为获得显著更好的性能。Chen等人[2017]对由各种随机连接的自动编码器组成的自动编码器集成进行了实验,以在几个基准数据集上获得有希望的结果。集成方法仍然是一个活跃的研究领域,它已经被证明可以产生更好的多样性,从而避免过拟合问题,同时减少训练时间。
Several anomaly detection algorithms based on clustering have been proposed in literature (Ester et al. [1996]). Clustering involves grouping together similar patterns based on features extracted detect new anomalies. The time and space complexity grows linearly with number of classes to be clustered (Sreekanth et al. [2010]), which renders the clustering based anomaly detection prohibitive for real-time practical applications. The dimensionality of the input data is reduced extracting features within the hidden layers of deep neural network which ensures scalability for complex and high dimensional datasets. Deep learning enabled clustering approach anomaly detection utilizes e.g word2vec (Mikolov et al. [2013]) models to get the semantical presentations of normal data and anomalies to form clusters and detect outliers (Y uan et al. [2017]). Several works rely on variants of hybrid models along with auto-encoders for obtaining representative features for clustering to find anomalies.
文献中已经提出了几种基于聚类的异常检测算法(Ester等人[1996])。聚类包括根据提取的特征将相似的模式组合在一起,以检测新的异常。时间和空间复杂度随着要聚类的类的数量线性增长(Sreekanth等人[2010]),这使得基于聚类的异常检测对于实时实际应用来说是不太可行的。通过在深层神经网络的隐藏层中提取特征来降低输入数据的维数,从而确保复杂和高维数据集的可扩展性。深度学习支持的聚类方法异常检测利用例如word2vec (Mikolov等人[2013]) 模型来获得正常数据和异常的语义表示,以形成聚类并检测异常值 (Yuan等人[2017])。一些工作依靠混合模型的变体以及自动编码器来获得聚类的代表性特征以发现异常。
Deep reinforcement learning (DRL) methods have attracted significant interest due to its ability to learn complex behaviors in high-dimensional data space. Efforts to detect anomalies using deep reinforcement learning have been proposed by de La Bourdonnaye et al. [2017], Chengqiang Huang [2016]. The DRL based anomaly detector does not consider any assumption about the concept of the anomaly, the detector identifies new anomalies by consistently enhancing its knowledge through reward signals accumulated. DRL based anomaly detection is a very novel concept which requires further investigation and identification of the research gap and its applications.
深度强化学习(DRL)方法由于其在高维数据空间中学习复杂行为的能力而引起了极大的兴趣。使用深度强化学习来检测异常的努力已经由de La Bourdonnaye等人[2017],Chengqiang Huang[2016]提出。基于DRL的异常检测器不考虑任何关于异常概念的假设,该检测器通过不断积累的奖励信号增强其知识来识别新的异常。基于DRL的异常检测是一个非常新颖的概念,需要进一步研究和确定研究差距及其应用。
Hilbert transform is a statistical signal processing technique which derives the analytic representation of a real-valued signal. This property is leveraged by (Kanarachos et al. [2015]) for real-time detection of anomalies in health-related time series dataset and is shown to be a very promising technique. The algorithm combines the ability of wavelet analysis, neural networks and Hilbert transform in a sequential manner to detect real-time anomalies. The topic of statistical techniques DAD techniques requires further investigation to understand their potential and applicability for anomaly detections fully.
Hilbert 变换是一种统计信号处理技术,它导出实值信号的解析表示。Kanarachos等人(2015年)利用这一特性实时检测健康相关时间序列数据集中的异常,这是一项非常有前途的技术。该算法以顺序方式结合了小波分析、神经网络和Hilbert变换的能力来检测实时异常。统计技术的主题需要进一步的研究,以充分理解它们的潜力和异常检测的适用性。
The ”deep” in ”deep neural networks” refers to the number of layers through which the features of data are extracted (Schmidhuber [2015], Bengio et al. [2009]). Deep architectures overcome the limitations of traditional machine learning approaches of scalability, and generalization to new variations within data (LeCun et al. [2015]) and the need for manual feature engineering. Deep Belief Networks (DBNs) are class of deep neural network which comprises multiple layers of graphical models known as Restricted Boltzmann Machine (RBMs). The hypothesis in using DBNs for anomaly detection is that RBMs are used as a directed encoder-decoder network with backpropagation algorithm (Werbos [1990]). DBNs fail to capture the characteristic variations of anomalous samples, resulting in high reconstruction error. DBNs are shown to scale efficiently to big-data and improve interpretability (Wulsin et al. [2010]).
深度神经网络中的“深度”是指提取数据特征的层数(Schmidhuber [2015],Bengio等人[2009])。深度架构克服了传统机器学习方法的局限性,如可扩展性、对数据中新变化的泛化(LeCun等人[2015]),以及对手动特征工程的需求。深度信念网络是一类深度神经网络,由多层图形模型组成,称为受限玻尔兹曼机。使用DBNs进行异常检测的假设是,径向基函数被用作具有反向传播算法的有向编解码网络(Werbos [1990])。DBNs无法捕捉异常样本的特征变化,导致重建误差较大。DBNs被证明可以有效地扩展到大数据并提高可解释性(Wulsin等人[2010])。
Researchers for long have explored techniques to learn both spatial and temporal relation features (Zhang et al. [2018f]). Deep learning architectures is leveraged to perform well at learning spatial aspects ( using CNN’s) and temporal features ( using LSTMs) individually. Spatio Temporal Networks (STNs) comprises of deep neural architectures combining both CNN’s and LSTMs to extract spatiotemporal features. The temporal features (modeling correlations between near time points via LSTM), spatial features (modeling local spatial correlation via local CNN’s) are shown to be effective in detecting outliers (Lee et al. [2018], SZEKÉR [2014], Nie et al. [2018], Dereszynski and Dietterich [2011]).
长期以来,研究人员一直在探索学习空间和时间关系特征的技术(Zhang等人[2018f])。深度学习架构被用来单独学习空间方面(使用CNN)和时间特征(使用LSTMs)。时空网络由深度神经结构组成,结合了神经网络和时空结构来提取时空特征。时间特征(通过对临近时间点之间的相关性进行建模)、空间特征(通过本地CNN对局部空间相关性进行建模)被证明在检测异常值方面是有效的(Lee等人[2018],SZEKÉR[2014],Nie等人[2018],Dereszynski和Dietterich [2011])。
Sum-Product Networks (SPNs) are directed acyclic graphs with variables as leaves, and the internal nodes, and weighted edges constitute the sums and products. SPNs are considered as a combination of mixture models which have fast exact probabilistic inference over many layers (Poon and Domingos [2011], Peharz et al. [2018]). The main advantage of SPNs is that, unlike graphical models, SPNs are more traceable over high treewidth models without requiring approximate inference. Furthermore, SPNs are shown to capture uncertainty over their inputs in a convincing manner, yielding robust anomaly detection (Peharz et al. [2018]). SPNs are shown to be impressive results on numerous datasets, while much remains to be further explored in relation to outlier detection.
和积网络(Sum-Product Networks,SPNs)是以变量为叶子的有向无环图,内部节点和加权边构成和与积。SPN被认为是混合模型的组合,在许多层上具有快速精确的概率推理(Poon和Domingos [2011],Peharz等人[2018])。SPN的主要优点是,与图模型不同,SPN在高树宽模型上更具可追溯性,而不需要近似推理。此外,单点登录显示出以令人信服的方式捕捉其输入的不确定性,从而产生稳健的异常检测(Peharz等人[2018])。SPN在许多数据集上显示出令人印象深刻的结果,而在离群点检测方面还有许多有待进一步探索的问题。
Word2vec is a group of deep neural network models used to produce word embeddings (Mikolov et al. [2013]). These models are capable of capturing sequential relationships within data instance such as sentences, time sequence data. Obtaining word embedding features as inputs are shown to improve the performance in several deep learning architectures (Rezaeinia et al. [2017], Naili et al. [2017], Altszyler et al. [2016]). Anomaly detection models leveraging the word2vec embeddings are shown to significantly improve performance (Schnabel et al. [2015], Bertero et al. [2017], Bakarov et al. [2018], Bamler and Mandt [2017]).
Word2vec是一组用于产生词嵌入的深度神经网络模型(Mikolov等人[2013])。这些模型能够捕捉数据实例中的顺序关系,例如句子、时间序列数据。获得单词嵌入特征作为输入被显示为在几种深度学习架构中提高性能(Rezaeinia等人[2017],Naili等人[2017],Altszyler等人[2016])。利用word2vec嵌入的异常检测模型显示出显著提高了性能(Schnabel等人[2015],Bertero等人[2017],Bakarov等人[2018],Bamler和Mandt [2017])。
Generative models aim to learn exact data distribution in order to generate new data points with some variations. The two most common and efficient generative approaches are Variational Autoencoders (VAE) (Kingma and Welling [2013]) and Generative Adversarial Networks (GAN) (Goodfellow et al. [2014a,b]). A variant of GAN architecture known as Adversarial autoencoders (AAE) ( Makhzani et al. [2015]) that use adversarial training to impose an arbitrary prior on the latent code learned within hidden layers of autoencoder are also shown to learn the input distribution effectively. Leveraging this ability of learning input distributions, several Generative Adversarial Networks-based Anomaly Detection (GAN-AD) frameworks (Li et al. [2018], Deecke et al. [2018], Schlegl et al. [2017], Ravanbakhsh et al. [2017b], Eide [2018]) proposed are shown to be effective in identifying anomalies on high dimensional and complex datasets. However traditional methods such as K-nearest neighbors (KNN) are shown to perform better in scenarios which have a lesser number of anomalies when compared to deep generative models (ˇSkvára et al. [2018]).
生成模型旨在学习精确的数据分布,以便生成具有一些变化的新数据点。两种最常见,最有效的生成方法是变分自动编码器(VAE)(Kingma和Welling [2013])和生成对抗网络(GAN)(Goodfellow等人[2014a,b])。 GAN架构的一种变体,称为对抗自动编码器(AAE)(Makhzani et al。[2015]),该模型使用对抗训练将任意先验强加于自动编码器隐藏层内学习的潜在代码上,还可以有效地学习输入分布。利用这种学习输入分布的能力,几个基于生成对抗网络的异常检测(GAN-AD)框架(Li等人[2018],Deecke等人[2018],Schlegl等人[2017],Ravanbakhsh等人[2017b],Eide [2018])建议可有效地识别高维和复杂数据集上的异常。然而,与深度生成模型相比,传统方法,如K近邻(KNN)在异常数量较少的情况下表现出更好的性能(Skvára等人[2018])。
Convolutional Neural Networks (CNN), are the popular choice of neural networks for analyzing visual imagery (Krizhevsky et al. [2012]). CNN’s ability to extract complex hidden features from high dimensional data with complex structure has enabled its use as feature extractors in outlier detection for both sequential and image dataset (Gorokhov et al. [2017], Kim [2014]). Evaluation of CNN’s based frameworks for anomaly detection is currently still an active area of research (Kwon et al. [2018]).
卷积神经网络(CNN)是用于分析视觉图像的神经网络的流行选择(Krizhevsky等人[2012])。 CNN能够从具有复杂结构的高维数据中提取复杂的隐藏特征,从而使其能够在序列和图像数据集的异常检测中用作特征提取器(Gorokhov等人[2017],Kim [2014])。目前,基于CNN的异常检测框架的评估仍然是研究的活跃领域(Kwon等人[2018])。
Recurrent Neural Networks (RNNs) (Williams [1989]) are shown to capture features of time sequence data. The limitations with RNNs is that they fail to capture the context as time steps increases, in order to resolve this problem, Long Short-Term Memory (Hochreiter and Schmidhuber [1997]) networks were introduced, they are a particular type of RNNs comprising of a memory cell that can store information about previous time steps. Gated Recurrent Unit (Cho et al. [2014]) (GRU) are similar to LSTMs, but use a set of gates to control the flow of information, instead of separate memory cells. Anomaly detection in sequential data has attracted significant interest in the literature due to its applications in a wide range of engineering problems illustrated in Section 9.9. Long Short Term Memory (LSTM) neural network based algorithms for anomaly detection have been investigated and reported to produce significant performance gains over conventional methods (Ergen et al. [2017]).
递归神经网络(RNN)(Williams [1989])可以捕获时序数据的特征。 RNN的局限性在于它们无法随着时间步长的增加而捕获上下文,为了解决此问题,引入了长短期记忆(Hochreiter和Schmidhuber [1997])网络,它们是一种特殊的RNN,包括:可以存储有关先前时间步长信息的存储单元。门控循环单元(GRU)(Cho等人,[2014])类似于LSTM,但是使用一组门来控制信息流,而不是使用单独的存储单元。顺序数据中的异常检测由于其在9.9节中说明的各种工程问题中的应用而引起了文献的极大兴趣。已经研究并报道了基于长期短期记忆(LSTM)神经网络的异常检测算法,该算法与常规方法相比具有显着的性能提升(Ergen等人[2017])。
Autoencoders with single layer along with a linear activation function are nearly equivalent to Principal Component Analysis (PCA) (Pearson [1901]). While PCA is restricted to a linear dimensionality reduction, auto encoders enable both linear or nonlinear tranformations (Liou et al. [2008, 2014]). One of the popular applications of Autoencoders is anomaly detection. Autoencoders are also referenced by the name Replicator Neural Networks (RNN) (Hawkins et al. [2002], Williams et al. [2002]). Autoencoders represent data within multiple hidden layers by reconstructing the input data, effectively learning an identity function. The autoencoders, when trained solely on normal data instances ( which are the majority in anomaly detection tasks), fail to reconstruct the anomalous data samples, therefore, producing a large reconstruction error. The data samples which produce high residual errors are considered outliers. Several variants of autoencoder architectures are proposed as illustrated in Figure 13 produce promising results in anomaly detection. The choice of autoencoder architecture depends on the nature of data, convolution networks are preferred for image datasets while Long short-term memory (LSTM) based models tend to produce good results for sequential data. Efforts to combine both convolution and LSTM layers where the encoder is a convolutional neural network (CNN) and decoder is a multilayer LSTM network to reconstruct input images are shown to be effective in detecting anomalies within data. The use of combined models such as Gated recurrent unit autoencoders (GRU-AE), Convolutional neural networks autoencoders (CNN-AE), Long short-term memory (LSTM) autoencoder (LSTM-AE) eliminates the need for preparing hand-crafted features and facilitates the use of raw data with minimal preprocessing in anomaly detection tasks. Although autoencoders are simple and effective architectures for outlier detection, the performance gets degraded due to noisy training data (Zhou and Paffenroth [2017]).
具有线性激活函数的单层自动编码器几乎等同于主成分分析(PCA)(Pearson [1901])。虽然PCA仅限于线性降维,但自动编码器可以实现线性或非线性变换(Liou等人[2008,2014])。自动编码器的流行应用之一是异常检测。自动编码器也被称为复制器神经网络(RNN)(Hawkins等人[2002],Williams等人[2002])。自动编码器通过重构输入数据来表示多个隐藏层中的数据,从而有效地学习身份函数。当仅对正常数据实例(在异常检测任务中占多数)进行训练时,自动编码器无法重建异常数据样本,因此会产生较大的重建误差。产生高残差的数据样本被认为是异常值。如图13所示,提出了几种自动编码器架构的变体,它们在异常检测中产生了可喜的结果。自动编码器体系结构的选择取决于数据的性质,卷积网络是图像数据集的首选,而基于长短期内存(LSTM)的模型往往会为顺序数据产生良好的结果。在编码器是卷积神经网络(CNN)和解码器是多层LSTM网络以重建输入图像的过程中,将卷积和LSTM层结合起来的努力被证明可以有效地检测数据中的异常。诸如门控循环单元自动编码器(GRU-AE),卷积神经网络自动编码器(CNN-AE),长短期记忆(LSTM)自动编码器(LSTM-AE)等组合模型的使用消除了准备手工特征的需要并有助于在异常检测任务中以最少的预处理使用原始数据。尽管自动编码器是用于离群值检测的简单有效的体系结构,但是由于嘈杂的训练数据,性能会降低(Zhou和Paffenroth [2017])。
Each of the deep anomaly detection (DAD) techniques discussed in previous sections have their unique strengths and weaknesses. It is critical to understand which anomaly detection technique is best suited for a given anomaly detection problem context. Given the fact that DAD is an active research area, it is not feasible to provide such an understanding for every anomaly detection problem. Hence in this section, we analyze the relative strengths and weaknesses of different categories of techniques for a few simple problem settings. Classification based supervised DAD techniques illustrated in Section 10.1 are better choices in scenario consisting of the equal amount of labels for both normal and anomalous instances. The computational complexity of supervised DAD technique is a key aspect, especially when the technique is applied to a real domain. While classification based, supervised or semi-supervised techniques have expensive training times, testing is usually fast since it uses a pre-trained model. Unsupervised DAD techniques presented in Section 10.5 are being widely used since label acquisition is a costly and time-consuming process. Most of the unsupervised deep anomaly detection requires priors to be assumed on the anomaly distribution hence the models are less robust in handling noisy data. Hybrid models illustrated in Section 10.3 extract robust features within hidden layers of the deep neural network and feed to best performing classical anomaly detection algorithms. The hybrid model approach is suboptimal because it is unable to influence representational learning in the hidden layers. The One-class Neural Networks (OC-NN) described in Section 10.4 combines the ability of deep networks to extract a progressively rich representation of data along with the one-class objective, such as a hyperplane (Chalapathy et al. [2018a]) or hypersphere (Ruff et al. [2018a]) to separate all the normal data points from anomalous data points. Further research and exploration is necessary to comprehend better the benefits of this new architecture proposed.
前面讨论的每种深度异常检测技术都有其独特的优势和劣势。了解哪种异常检测技术最适合给定的异常检测问题环境至关重要。鉴于DAD是一个活跃的研究领域,为每个异常检测问题提供这样的理解是不可行的。因此,在这一节中,我们针对一些简单的问题设置,分析不同类别技术的相对优势和劣势。第10.1节中说明的基于分类的监督式DAD技术在正常和异常情况下包含等量标签的情况下是更好的选择。有监督的DAD技术的计算复杂性是一个关键方面,特别是当该技术应用于一个真实的领域。虽然基于分类的、监督的或半监督的技术具有昂贵的训练时间,但是测试通常是快速的,因为它使用预先训练的模型。第10.5节中介绍的无监督直接检测技术正被广泛使用,因为获取标签是一个既昂贵又耗时的过程。大多数无监督的深层异常检测需要假设异常分布的先验,因此模型在处理有噪声的数据时不够稳健。第10.3节中描述的混合模型在深层神经网络的隐藏层中提取鲁棒特征,并提供给性能最佳的经典异常检测算法。混合模型方法是次优的,因为它不能影响隐藏层中的表征学习。第10.4节中描述的一类神经网络(OC-NN)结合了深度网络的能力,以提取与单类目标一起的逐渐丰富的数据表示,例如超平面或超球面,以将所有正常数据点与异常数据点分开。进一步的研究和探索是必要的,以更好地理解这一新的架构提出的好处。
In this survey paper, we have discussed various research methods in deep learning-based anomaly detection along with its application across various domains. This article discusses the challenges in deep anomaly detection and presents several existing solutions to these challenges. For each category of deep anomaly detection techniques, we present the assumption regarding the notion of normal and anomalous data along with its strength and weakness. The goal of this survey was to investigate and identify the various deep learning models for anomaly detection and evaluate its suitability for a given dataset. When choosing a deep learning model to a particular domain or data, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. Deep learning based anomaly detection is still active research, and a possible future work would be to extend and update this survey as more sophisticated techniques are proposed.
在这篇综述论文中,我们讨论了基于深度学习的异常检测的各种研究方法及其在各个领域的应用。本文讨论了深度异常检测中的挑战,并针对这些挑战提出了几种现有的解决方案。对于每一类深度异常检测技术,我们都给出了关于正常和异常数据的概念及其优缺点的假设。这项研究的目的是研究和识别各种深度学习模型,用于异常检测,并评估其对给定数据集的适用性。当选择特定领域或数据的深度学习模型时,这些假设可以用作评估该领域技术有效性的指南。基于深度学习的异常检测仍然是一项活跃的研究,随着更复杂技术的提出,未来可能的工作将是扩展和更新该研究。