matplotlib vs bokeh 7图表,您必须知道如何绘制

    科技2022-07-12  137

    Data visualization can be a pain but it is a necessary skill to master for people in the data science and analytics community. Luckily there are plenty of tools and libraries at our disposal to help us with this task.

    数据可视化可能会很痛苦,但是它是掌握数据科学和分析社区中人们的一项必要技能。 幸运的是,有许多工具和库可供我们使用,以帮助我们完成此任务。

    In this article, I will take you through examples of how to draw charts using two of the most popular Python visualization libraries: Matplotlib and Bokeh. In addition to learning how to use these libraries, a comparison of the two will help us see which one is better suited for different types of tasks.

    在本文中,我将为您提供有关如何使用两种最受欢迎​​的Python可视化库绘制图表的示例:Matplotlib和Bokeh。 除了学习如何使用这些库之外,对这两个库进行比较还可以帮助我们确定哪个库更适合不同类型的任务。

    Charts covered in this article:

    本文涵盖的图表:

    Histogram

    直方图 Vertical Bar & Horizontal Bar

    竖条和横条 Vertical Stacked Bar & Horizontal Stacked Bar

    垂直堆叠杆和水平堆叠杆 Line

    线 Area & Area Stacked

    面积和堆积面积 Pie & Donut

    馅饼和甜甜圈 Scatter & Scatter Bubble

    散点图

    建立 (Setup)

    To make it a bit more fun, I will use a dataset that contains Formula 1 race winners between 1990 and 2019. Hence, all the examples below will take data from a Pandas DataFrame with some data manipulations where necessary. Note, the entire notebook together with the data can be found on my Github repository. However, you don't need to download it as I have included all the code snippets and relevant output in this article.

    为了使它更有趣,我将使用一个包含1990年至2019年之间一级方程式比赛获胜者的数据集。因此,以下所有示例将从Pandas DataFrame中获取数据,并在必要时进行一些数据操作。 注意,整个笔记本和数据都可以在我的Github存储库中找到。 但是,您不需要下载它,因为我在本文中包含了所有代码段和相关输出。

    To start with, we will import Python libraries that we will need to use.

    首先,我们将导入需要使用的Python库。

    Note, for Bokeh library we can choose in advance whether we want our charts to be displayed inside the notebook or to be saved in an HTML file. In this case, we want charts to be displayed in our Jupyter notebook, so we will call the output_notebook(). We only need to call this once, and all subsequent calls to show() will display charts inline in the notebook.

    注意,对于Bokeh库,我们可以预先选择是将图表显示在笔记本中还是将其保存在HTML文件中。 在这种情况下,我们希望图表显示在Jupyter笔记本中,因此我们将调用output_notebook() 。 我们只需要调用一次,随后所有对show()的调用都会在笔记本中内联显示图表。

    Next, we need to import the data before we can start creating those wonderful charts.

    接下来,我们需要先导入数据,然后才能开始创建这些精美的图表。

    This is what the data looks like. Each row represents one race and shows who the race winner was, how many laps the race had, and how long it took to complete it.

    这就是数据的样子。 每行代表一场比赛,并显示比赛获胜者是谁,比赛有几圈以及完成比赛所需的时间。

    1.直方图 (1. Histogram)

    To draw histograms we need a continuous variable. Let us use “Laps” column which will give us a distribution of laps between different races. Note, a typical F1 race distance is ~300km. Hence, shorter tracks would have more laps and longer tracks would have fewer laps required to make up that same race distance.

    要绘制直方图,我们需要一个连续变量。 让我们使用“圈数”列,这将使我们在不同种族之间分配圈数。 注意,典型的F1比赛距离约为300公里。 因此,较短的赛道将具有更多的圈,而较长的赛道将具有更少的圈以弥补相同的比赛距离。

    Matplotlib

    Matplotlib

    Bokeh

    散景

    Note, there is no existing histogram method in Bokeh, hence we will first need to use Numpy to create bins, which is then used to draw a histogram.

    请注意,Bokeh中没有现有的直方图方法,因此我们首先需要使用Numpy创建容器,然后再使用该容器绘制直方图。

    2.垂直杆和水平杆 (2. Vertical Bar & Horizontal Bar)

    Bar charts are similar to histograms but they can also take categorical variables. For our bar charts, we will use “Winner” column which will show us the number of races won by each driver between 1990 and 2019.

    条形图类似于直方图,但是它们也可以采用分类变量。 对于我们的条形图,我们将使用“优胜者”列,该列将向我们显示每个车手在1990年至2019年之间赢得的比赛次数。

    Before we can plot it, we need to do some aggregation of the data. It is simple data manipulation in Pandas with the result being two new DataFrames: one with race winners ordered from lowest to highest win count and the other one being the same but ordered from highest to lowest win count.

    在进行绘制之前,我们需要对数据进行一些汇总。 在Pandas中,这是简单的数据操作,结果是两个新的DataFrame:一个将竞赛获胜者从最低到最高的获胜顺序排序,另一个将相同,但从最高获胜的顺序到最低。

    Below is a snippet of the resulting data:

    以下是结果数据的摘要:

    Matplotlib

    Matplotlib

    First, we plot a vertical bar chart:

    首先,我们绘制一个垂直条形图:

    Followed by the horizontal bar chart:

    其次是水平条形图:

    Bokeh

    散景

    Vertical:

    垂直:

    And then horizontal:

    然后水平:

    As you can see, the results are pretty similar and it is pretty easy to plot bar charts with both libraries.

    如您所见,结果非常相似,并且使用两个库绘制条形图非常容易。

    3.垂直堆叠杆和水平堆叠杆 (3. Vertical Stacked Bar & Horizontal Stacked Bar)

    To draw stacked bar charts, we will need to use an additional dimension from our data. We will keep race winners and also add the team they drove for at the time they won those GPs.

    要绘制堆叠的条形图,我们将需要使用数据中的其他尺寸。 我们将保留比赛获胜者,并在他们赢得这些GP时增加他们所驾驶的团队。

    The easiest way to get the data in the required format is by using Pandas crosstab function. This will create a matrix that will enable us to plot stacked charts with minimal coding.

    以所需格式获取数据的最简单方法是使用Pandas交叉表功能。 这将创建一个矩阵,使我们能够以最少的编码绘制堆叠图。

    Here is the snippet of the resulting data:

    这是结果数据的片段:

    We can now plot our stacked bar charts.

    现在,我们可以绘制堆积的条形图。

    Matplotlib

    Matplotlib

    Vertical:

    垂直:

    Horizontal:

    水平:

    Bokeh

    散景

    Vertical:

    垂直:

    Horizontal:

    水平:

    4.折线图 (4. Line Charts)

    Line charts are being used everywhere and they are usually most suitable for displaying trends.

    折线图无处不在,通常最适合显示趋势。

    Again, we will need to do a little bit of data prep before we can plot it. For our case, we will select the Monaco Grand Prix and plot the amount of time it took to complete this race in each of the last 30 years. Note, race times can vary significantly, especially when they are affected by bad weather conditions or major incidents.

    同样,我们需要做一点数据准备,然后才能进行绘制。 对于我们的情况,我们将选择摩纳哥大奖赛,并绘制出过去30年中每年完成此比赛所花费的时间。 请注意,比赛时间可能会有很大差异,尤其是在恶劣天气或重大事件影响下。

    The resulting DataFrame looks like this:

    产生的DataFrame如下所示:

    As before let us plot it with Matplotlib first, followed by Bokeh.

    如前所述,让我们先使用Matplotlib对其进行绘制,然后使用Bokeh。

    Matplotlib

    Matplotlib

    Bokeh

    散景

    5.面积和堆叠面积 (5. Area & Area Stacked)

    For the area chart with one variable, we will use the same data that we used to plot line charts. Meanwhile, for stacked area charts we will add an additional race track (Great Britain).

    对于具有一个变量的面积图,我们将使用与绘制折线图相同的数据。 同时,对于堆积面积图,我们将添加一条附加赛道(英国)。

    Below is the preparation of the data for stacked area charts:

    以下是堆积面积图的数据准备:

    Note, the resulting DataFrame is largely identical to the one in the line chart data preparation but simply contains extra rows for Great Britain. Hence, you can refer to the picture in the previous section if you would like to see what the DataFrame looks like.

    请注意,生成的DataFrame与折线图数据准备中的DataFrame基本相同,但仅包含针对英国的额外行。 因此,如果您想查看DataFrame的外观,可以参考上一节中的图片。

    As before, let us plot a single area chart first, followed by a stacked area chart:

    和以前一样,让我们​​先绘制一个面积图,然后再绘制一个堆叠的面积图:

    Matplotlib

    Matplotlib

    Area:

    区:

    Stacked area:

    堆积面积:

    Bokeh

    散景

    Area:

    区:

    Stacked Area:

    堆积面积:

    If you looked at the code carefully, you must have noticed that there was a little bit of extra work required to get the data in the right format for Bokeh stacked area charts. However, after that, it was just as easy to plot it with Bokeh as it was with Matplotlib.

    如果仔细查看代码,您肯定已经注意到,为散景堆积面积图以正确的格式获取数据需要做一些额外的工作。 但是,此后,使用Bokeh进行绘制与使用Matplotlib进行绘制一样容易。

    6.饼图和甜甜圈图 (6. Pie & Donut Charts)

    There is an old joke which says that every PowerPoint presentation should contain at least one pie chart. Whether you agree with that or not, pie charts are extremely common, hence it is very important to know how to plot them.

    有个老笑话说每个PowerPoint演示文稿应至少包含一个饼图。 不管您是否同意,饼图都是非常普遍的,因此了解如何绘制饼图非常重要。

    For this example, we will use the top 5 drivers with the most wins and we will group the rest of them into ‘Other Drivers’ category.

    在此示例中,我们将使用获胜次数最多的前5名车手,并将其余的车手归入“其他车手”类别。

    Our resulting DataFrame is pretty simple and looks like this:

    我们得到的DataFrame非常简单,看起来像这样:

    Let us now plot some pies and donuts.

    现在让我们绘制一些馅饼和甜甜圈。

    Matplotlib

    Matplotlib

    Pie chart:

    饼形图:

    Donut:

    甜甜圈:

    Bokeh

    散景

    Unfortunately, Bokeh does not have a high-level method to plot pie and donut charts. Hence, we will have to use a “wedge” glyph, which means a bit of extra work will be necessary. This includes setting the angles for each wedge and adding color values into a DataFrame.

    不幸的是,Bokeh没有绘制饼图和甜甜圈图的高级方法。 因此,我们将必须使用“楔形”字形,这意味着需要做一些额外的工作。 这包括设置每个楔形的角度并将颜色值添加到DataFrame中。

    This is what our updated DataFrame looks like:

    这就是我们更新后的DataFrame的样子:

    Pie chart:

    饼形图:

    Donut:

    甜甜圈:

    Note, I left the grid visible so it is easy for you to see how we are drawing these charts. Also, there is no easy way in Bokeh to add labels to pie and donut charts. One way would be to manually specify chart coordinates for each one of the labels. If you wanted something a bit more automated then you would have to use some trigonometry to calculate the coordinates based on the wedge angles.

    注意,我使网格可见,因此您可以轻松地看到我们如何绘制这些图表。 而且,在Bokeh中没有简单的方法向饼图和甜甜圈图添加标签。 一种方法是为每个标签手动指定图表坐标。 如果您想要更自动化的东西,则必须使用一些三角函数来基于楔角计算坐标。

    What is much simpler though, is enabling labels to show when you hover with your mouse. This is what the “tooltips=” in the code is for. However, this only works when your chart is being rendered in HTML and not when it is exported as png (as the case for a picture used in this article).

    不过,更简单的是,使标签在您将鼠标悬停时显示。 这就是代码中“ tooltips =”的含义。 但是,这仅在图表以HTML呈现时有效,而在将其导出为png时则无效(例如本文中使用的图片)。

    7.散点图 (7. Scatter & Scatter Bubble)

    Finally, let us draw some scatter plots. For this, we will take the 2019 season from our DataFrame and use “Laps” together with a slightly modified “Time” column.

    最后,让我们绘制一些散点图。 为此,我们将从DataFrame中获取2019赛季,并使用“圈数”和稍作修改的“时间”列。

    Due to the difference in track lengths, there will not be much correlation between the two metrics but this is for illustration purposes only. Note, we will also need to convert “Time” to float as we did for line charts.

    由于磁道长度的差异,两个度量之间不会有太大的相关性,但这仅出于说明目的。 注意,我们还需要像折线图一样将“时间”转换为浮点型。

    The resulting DataFrame is not much different from the original one. It simply contains a few additional columns that we will need to use for scatter plots.

    最终的DataFrame与原始DataFrame没有太大区别。 它仅包含一些其他列,我们需要将它们用于散点图。

    Matplotlib

    Matplotlib

    Scatter:

    分散:

    Scatter Bubble:

    分散气泡:

    Bokeh

    散景

    Scatter:

    分散:

    Scatter Bubble:

    分散气泡:

    判决 (Verdict)

    After plotting all of these charts we can evaluate both libraries based on how the charts look and also the ease of use.

    绘制完所有这些图表后,我们可以根据图表的外观以及易用性来评估两个库。

    Personally, I prefer the way Bokeh charts look. Out of the box, they simply look nicer and have higher resolution. It is, of course, possible to customize Matplotlib charts to make them look better but that carries the price of having to do some extra coding.

    就个人而言,我更喜欢散景图的外观。 开箱即用,它们看起来更好,分辨率更高。 当然,可以自定义Matplotlib图表以使其看起来更好,但这样做的代价是必须进行一些额外的编码。

    As for the ease of use, the point goes to Matplotlib. It is primarily due to the much easier way to plot pie and donut charts as well as the ability to plot histograms without requiring to use NumPy.

    至于易用性,关键在于Matplotlib。 这主要是由于绘制饼图和甜甜圈图的方法更加简单,而且无需使用NumPy即可绘制直方图。

    To summarize, it often comes to preference and familiarity. However, we have uncovered that each of the libraries has its own pros and cons.

    总而言之,它经常涉及偏好和熟悉度。 但是,我们发现每个库都有各自的优缺点。

    I hope that this comparison of the 7 most common charts in Matplolib and Bokeh will help you choose the right library for you.

    我希望对Matplolib和Bokeh中最常见的7种图表进行比较,可以帮助您选择合适的库。

    Cheers!SolClover

    干杯! SolClover

    翻译自: https://medium.com/swlh/matplotlib-vs-bokeh-7-charts-you-must-know-how-to-plot-a74af5857227

    Processed: 0.009, SQL: 8