平板电脑方案提供商
As AI systems become increasingly ubiquitous in many industries, the need to monitor these systems rises. AI systems, much more than traditional software, are hyper sensitive to changes in their data inputs. Consequently, a new class of monitoring solutions has risen at the data and functional level (rather than infrastructure of application levels). These solutions aim to detect the unique issues that are common in AI systems, namely concept drifts, biases and more.
随着AI系统在许多行业中变得越来越普遍,监视这些系统的需求也在增加。 与传统软件相比,人工智能系统对数据输入的变化非常敏感。 因此,在数据和功能级别(而不是应用程序级别的基础结构)上出现了一类新的监视解决方案。 这些解决方案旨在检测AI系统中常见的独特问题,即概念漂移,偏差等。
The AI vendor landscape is now crowded with companies touting monitoring capabilities. These companies include best-of-breed/standalone solutions, and integrated AI lifecycle management suites. The latter offer more basic monitoring capabilities, as a secondary focus.
AI供应商的格局现在到处都是公司,他们大肆宣传监视功能。 这些公司包括同类最佳/独立解决方案以及集成的AI生命周期管理套件。 后者提供更多基本监视功能,作为第二要务。
To further the hype, some of the major cloud providers began communicating that they also offer monitoring features for machine learning models deployed on their cloud platforms. AWS and Azure, the largest and 2nd largest providers by market share, each announced specific features under the umbrellas of their respective ML platforms — SageMaker Model Monitor (AWS), and Dataset Monitors (Azure) respectively. Google (GCP), so far, seems to only offer application level monitoring for serving models and for training jobs.
为了进一步宣传,一些主要的云提供商开始交流,他们还提供针对其云平台上部署的机器学习模型的监视功能。 AWS和Azure(按市场份额划分的最大和第二大提供商)分别在各自的ML平台(分别为SageMaker Model Monitor (AWS)和Dataset Monitors (Azure))的保护下宣布了特定功能。 到目前为止,Google(GCP)似乎仅提供服务级别的监视以服务模型和培训工作。
In this post we provide a general overview of the current offerings from the cloud providers (we focused on AWS and Azure) and discuss the gaps in these solutions (which are generally covered well by best-of-breed solutions).
在本文中,我们提供了有关云提供商(我们专注于AWS和Azure)当前产品的总体概述,并讨论了这些解决方案中的差距(同类最佳解决方案通常涵盖了这些差距)。
So what do Azure and AWS have to offer regarding monitoring models in production?
那么,Azure和AWS对于生产中的监视模型必须提供什么?
The first part of monitoring any kind of system is almost always logging data from the operation of the system. In the case of monitoring ML models, this starts from the model inputs and outputs.
监视任何类型的系统的第一部分几乎总是从系统的运行中记录数据。 在监视ML模型的情况下,这从模型的输入和输出开始。
Unsurprisingly, both Azure and AWS allow you to easily log this data to their respective data stores (S3 buckets for AWS, blob storage for Azure). All you have to do is add a data capturing configuration to the model run call in your python code. Note that not all input types can be automatically saved (e.g., on Azure, audio, images, and video are not collected). AWS allows configuring a sampling rate for the data capturing as well.
毫不奇怪,Azure和AWS都允许您轻松地将此数据记录到它们各自的数据存储中(用于AWS的S3存储桶,用于Azure的Blob存储)。 您要做的就是在python代码中的模型运行调用中添加数据捕获配置。 请注意,并非所有输入类型都可以自动保存(例如,在Azure上,不会收集音频,图像和视频)。 AWS还允许为数据捕获配置采样率。
Once the input and output data is collected, the next step suggested by the platforms is to use their existing analytics/APM offerings to track the data.
一旦收集到输入和输出数据,平台建议的下一步就是使用其现有的分析/ APM产品来跟踪数据。
Azure recommends using Power BI or Azure DataBricks to get an initial analysis of the data, while in AWS you could use Amazon Cloudwatch.
Azure建议使用Power BI或Azure DataBricks对数据进行初步分析,而在AWS中,您可以使用Amazon Cloudwatch。
Since the data is saved in the platform’s own storage system, it is usually pretty straightforward to get initial charts and graphs tracking your model’s inputs and outputs.
由于数据保存在平台自己的存储系统中,因此通常很容易获得初始图表来跟踪模型的输入和输出。
Both AWS and Azure offer fairly new tools for alerting on changes in the distribution and behavior of your model’s input data. For AWS this is the main part of “Amazon SageMaker Model Monitor”, whereas for Azure this is done via a very new feature called “Datasets Monitors”.
AWS和Azure都提供了相当新的工具来提醒模型输入数据的分布和行为发生变化。 对于AWS,这是“ Amazon SageMaker Model Monitor”的主要部分,而对于Azure,这是通过一项称为“ Datasets Monitors ”的非常新的功能完成的。
On both platforms, the workflow starts by creating a baseline dataset, which is usually based directly on the training dataset.
在这两个平台上,工作流程均始于创建一个基线数据集,该基线数据集通常直接基于训练数据集。
Once you have that baseline ready, the platforms allow you to create datasets from inference input data captured as described above, compare them to the baseline dataset, and get reports on changes in the feature distributions.
一旦准备好基线,平台就可以让您根据如上所述捕获的推理输入数据创建数据集,将它们与基线数据集进行比较,并获取有关要素分布变化的报告。
There are some differences between the two solutions. AWS’ solution creates “constraints” and “statistics” files from the baseline data set, which contain statistical information on the input data. This allows you to later compare to inference data to get reports on differences. On the other hand, Azure’s “Dataset Monitors” provides you with a dashboard comparing the distributions of each feature between the baseline dataset and the inference time one. It then allows you to set up alerts for when the change in distribution is large enough.
两种解决方案之间存在一些差异。 AWS的解决方案从基准数据集创建“约束”和“统计”文件,其中包含有关输入数据的统计信息。 这使您以后可以与推理数据进行比较,以获取差异报告。 另一方面,Azure的“数据集监视器”为您提供了一个仪表板,用于比较基准数据集和推理时间之间每个功能的分布。 然后,它允许您为分布的变化足够大时设置警报。
However, the above differences are really implementation details for the same basic functionality — take your feature set, look at their baseline distribution in your training set, and compare it to their distribution in inference time.
但是,以上差异实际上是相同基本功能的实现细节-提取您的功能集,查看它们在训练集中的基线分布,并将其与推断时间的分布进行比较。
So, the cloud providers do offer monitoring capabilities for the data and model layer, but can you rely on these capabilities to sustain and even improve upon a production grade AI system? We believe that you cannot. Here are a few reasons for that:
因此,云提供商确实提供了对数据和模型层的监视功能,但是您能否依靠这些功能来维持甚至改进生产级AI系统? 我们相信您不能。 原因如下:
Tracking your model inputs and outputs is good, but it’s not enough to really understand your data and models’ behaviors. What you really need to monitor isn’t a single model — but an entire AI system. Many times, this will include data your cloud provider cannot easily access.
跟踪模型的输入和输出固然很好,但是仅仅了解真实数据和模型的行为还远远不够。 您真正需要监视的不是单个模型,而是整个AI系统。 很多时候,这将包括您的云提供商无法轻松访问的数据。
A few examples:
一些例子:
You have a human labeling system, and you’d like to monitor how your model’s output compares to their labeling, to get real performance metrics for your model. 您拥有一个人工标签系统,并且想要监视模型的输出与标签的比较,以获取模型的实际性能指标。 Your system contains several models and pipelines, and one of the models’ output is used as an input feature for a subsequent model. An underperformance in the first model may be the root cause for an underperformance in the second model, and your monitoring system should understand this dependency and alert you accordingly. 您的系统包含多个模型和管道,并且模型的输出之一用作后续模型的输入功能。 第一个模型中的性能不佳可能是第二个模型中的性能不佳的根本原因,您的监视系统应了解这种依赖性并相应地向您发出警报。 You have actual business results (e.g., whether the ad your suggestions model chose was actually clicked) — this is a very important metric to measure your model’s performance, and is relevant even if the input features never really changed. 您会有实际的业务成果(例如,是否实际点击了您的建议模型选择的广告)–这是衡量模型性能的非常重要的指标,即使输入功能从未真正改变过,它也很重要。 You have metadata which you don’t want (or even not allowed, e.g., race/gender) to use as an input feature, but you do want to track it for monitoring, to make sure you are not biased on that data field. 您有不想(或什至不允许,例如种族/性别)用作输入功能的元数据,但您确实希望对其进行跟踪以进行监视,以确保您不会偏向该数据字段。For more on context based monitoring — check out this post about the platform approach to monitoring.
有关基于上下文的监视的更多信息,请参阅有关平台监视方法的文章。
It is not uncommon for an AI system to work just fine on average, but grossly underperform on sub-segments of the data. So, a granular examination of performance is crucial.
这是很常见的一个AI系统的工作就好了平均,但对数据的子段严重弱于大盘。 因此,对性能进行细致检查至关重要。
Consider a case where your model behaves very differently on data coming from one of your customers. If this customer accounts for 5% of the data your model ingests, then the overall average performance of the model might seem fine. This customer, however, will not be pleased. The same could be true for different geolocations, devices, browsers or any other dimension along which your data could be sliced.
考虑一种情况,您的模型在来自一个客户的数据上的行为有很大不同。 如果此客户占模型吸收的数据的5%,则模型的总体平均性能似乎不错。 但是,该客户将不满意。 对于不同的地理位置,设备,浏览器或可以沿其切片数据的任何其他维度,情况也可能如此。
A good monitoring solution will alert you when anomalous behavior in sub-segments happens, including when it happens in more granular sub-segments, e.g., for users coming from a specific geo using a specific device.
良好的监控解决方案会在子段中发生异常行为时向您发出警报,包括在更细化的子段中发生异常行为时(例如,来自使用特定设备的特定地理位置的用户)。
Every AI system is like a snowflake. They all have specific performance metrics, acceptable (or unacceptable) behaviors, etc. A good AI monitoring platform must therefore be highly configurable.
每个AI系统都像雪花一样。 它们都有特定的性能指标,可接受的(或不可接受的)行为等。因此,良好的AI监视平台必须具有高度可配置性。
Consider a case where you have an NLP model detecting the sentiment of input texts. You know that on short texts (e.g., below 50 characters), your model isn’t very accurate, and you’re OK with this. You’d like to monitor your model outputs, but you don’t want to be alerted on low confidence scores when there’s an increase in the relative proportion of short input texts. Your monitoring platform must allow you to easily exclude all short texts from the monitored dataset, when considering this specific metric (but maybe not for other metrics).
考虑一种情况,您有一个NLP模型检测输入文本的情绪。 您知道,在短文本(例如,少于50个字符)上,您的模型不是很准确,您可以接受。 您想监视模型输出,但是当短输入文本的相对比例增加时,您不希望收到置信度低的警报。 当考虑此特定指标(但可能不适用于其他指标)时,您的监视平台必须允许您轻松地从受监视的数据集中排除所有短文本。
There are many other examples that illustrate the value of fine tuning, ranging from alerting preferences to ad-hoc data manipulations. The completely autonomous monitoring approach sounds good in theory and is easy to explain, but will fail when encountering real world constraints.
还有许多其他示例说明了微调的价值,范围从警报首选项到临时数据操作。 完全自主的监视方法在理论上听起来不错,易于解释,但是在遇到现实世界的限制时会失败。
First, it is encouraging to see the major cloud providers beginning to provide more tooling for production AI. Nevertheless, the solutions we reviewed were quite basic and experimental (Azure, for example, does not yet provide any SLAs for the above mentioned solution). Monitoring certainly does not feel like a top priority for these providers.
首先,令人鼓舞的是,主要的云提供商开始为生产AI提供更多工具。 但是,我们审查的解决方案是非常基础的和实验性的(例如,Azure尚未为上述解决方案提供任何SLA)。 对于这些提供商,监控当然不是最优先考虑的事情。
At the same time, it is becoming increasingly clear in the industry that monitoring models and the entire AI system is a foundational need that cannot be treated as an afterthought. It is crucial to get this right in order to make your AI production ready, and to make sure AI issues are caught before business KPIs are negatively impacted. The best-of-breed solutions have certainly made monitoring their core focus and priority.
同时,业内越来越清楚的是,监视模型和整个AI系统是一项基本需求,不能被视为事后考虑。 为了使您的AI产品准备就绪,并确保在业务KPI受到不利影响之前捕获到AI问题,正确执行此步骤至关重要。 同类最佳的解决方案无疑使监控其核心重点和优先级成为可能。
So, will best-of-breeds have an advantage in the market? It remains to be seen. One possible precedence to consider is that of the APM industry. The cloud providers have long had rudimentary solutions for IT organizations, and yet the market gave rise to a successful category of best-of-breed players such as New Relic, AppDynamics, and Datadog (and quite a few others). Some buyers settle for more basic capabilities as they prefer to deal with fewer vendors, whereas others prefer the most in depth capabilities in every stage of the life cycle.
那么,同类最佳的产品在市场上会占优势吗? 它还有待观察。 要考虑的一种可能的优先事项是APM行业。 云提供商长期以来一直为IT组织提供基本的解决方案,但是市场催生了成功的同类最佳参与者,例如New Relic,AppDynamics和Datadog(以及许多其他同类公司)。 一些购买者希望获得更多基本功能,因为他们倾向于与更少的供应商打交道,而另一些购买者则更喜欢生命周期每个阶段中最深入的功能。
In any event, the evolution of this category will surely be interesting to observe and experience.
无论如何,观察和体验这一类别的发展肯定会很有趣。
Originally published at https://www.monalabs.io on Sept 15, 2020.
最初于2020年9月15日发布在https://www.monalabs.io 。
翻译自: https://towardsdatascience.com/should-you-use-the-ml-monitoring-solution-offered-by-your-cloud-provider-60989653132c
平板电脑方案提供商
相关资源:四史答题软件安装包exe