hist函数--matplotlib

科技2025-12-21 12

本节结合官方文献学习绘制直方图，对官方文档还有很多尚未领悟，发现错误欢迎指正，共同进步。

hist函数–用于绘制直方图

函数功能： Plot a histogram. 绘制直方图

函数语法： hist(x, bins=None, range=None, density=False, weights=None, cumulative=False, bottom=None, histtype=‘bar’, align=‘mid’, orientation=‘vertical’, rwidth=None, log=False, color=None, label=None, stacked=False, *, data=None, **kwargs)

函数参数： x： (n,) array or sequence of (n,) arrays。Input values, this takes either a single array or a sequence of arrays which are not required to be of the same length. 输入值数据集，可以是单个数组也可以是一系列数组，一系列数组中的数组不要求长度相同

bins： int or sequence or str, default: rcParams[“hist.bins”] (default: 10)。箱数：可以是整数、序列、字符串；默认分成10箱。

If bins is an integer, it defines the number of equal-width bins in the range. 如果

参 数 b i n s

是整数，该整数意思就是将整个数据等距分成该整数组。 If bins is a sequence, it defines the bin edges, including the left edge of the first bin and the right edge of the last bin; in this case, bins may be unequally spaced. All but the last (righthand-most) bin is half-open 如果

参 数 b i n s

是个序列，该序列就决定了箱的边界，包括第一箱的左边界与最后一箱的右边界；该情况中，箱子可能不是等距划分。除了最后一箱外，其他箱子区间均为半开区间，最后一箱为闭区间。如，序列： [1, 2, 3, 4]

第一箱为 $[1, 2)$ (包括数值1，不包括数值2)，第二箱为 $[2, 3)$ (包括数值2，不包括数值3)，最后一箱为 $[3, 4]$ （包括数值3，包括数值4）

If bins is a string, it is one of the binning strategies supported by numpy.histogram_bin_edges: ‘auto’, ‘fd’, ‘doane’, ‘scott’, ‘stone’, ‘rice’, ‘sturges’, or ‘sqrt’. 如果

参 数 b i n s

是个字符串，它是numpy支持的分箱策略之一,可选

{ 'auto', 'fd', 'doane', 'scott', 'stone', 'rice', 'sturges', 'sqrt'}

中的一个。

range： tuple or None, default: None。The lower and upper range of the bins. Lower and upper outliers are ignored. If not provided, range is (x.min(), x.max()). Range has no effect if bins is a sequence. If bins is a sequence or range is specified, autoscaling is based on the specified bin range instead of the range of x. 元组或者为空，默认值为空。箱子的上下边界，忽略上下异常值*（默认包含异常值在内，未排除）*，如果该参数为空， $参数 r a n g e$ 的范围是从 $（ x m i n ， x m a x ）$ 。当 $参数 b i n s$ 是个序列时， $参数 r a n g e$ 无效。

当 $参数 b i n s$ 为 $s t r 或者 i n t$ 时， $参数 r a n g e$ 可以对直方图的上下范围进行重新界定。

当 $参数 b i n s$ 是数值序列时可以看到， $参数 r a n g e$ 并未对直方图结果所做的改变生效可以使用 $参数 r a n g e$ 排除异常值再绘图，也可以直接在 $参数 b i n s$ 中使用序列在序列中将异常值排除

density： bool, default: False If True, draw and return a probability density: each bin will display the bin’s raw count divided by the total number of counts and the bin width (density = counts / (sum(counts) * np.diff(bins))), so that the area under the histogram integrates to 1 (np.sum(density * np.diff(bins)) == 1). If stacked is also True, the sum of the histograms is normalized to 1.（尚不清楚）密度：设置是否以密度形式展示柱子高；布尔值，默认值：False，指定柱高信息，可以选择数值（False）或者频率（True）。当取值为True时，绘制并返回概率密度： $每个柱子的高度 = 每组频率 / （样本总量 * 各组组距）$ ，这样以来，直方图下的区域面积为1

weights

**：此部分尚未有相关知识理解，后续学习补充 (n,) array-like or None, default: None An array of weights, of the same shape as x. Each value in x only contributes its associated weight towards the bin count (instead of 1). If density is True, the weights are normalized, so that the integral of the density over the range remains 1. This parameter can be used to draw a histogram of data that has already been binned,e.g. using numpy.histogram (by treating each bin as a single point with a weight equal to its count)

cumulative： bool or -1, default: False If True, then a histogram is computed where each bin gives the counts in that bin plus all bins for smaller values. The last bin gives the total number of datapoints. If density is also True then the histogram is normalized such that the last bin equals 1. If cumulative is a number less than 0 (e.g., -1), the direction of accumulation is reversed. In this case, if density is also True, then the histogram is normalized such that the first bin equals 1. 累计：设置直方图是否以叠加累计展示；布尔型，或者 $- 1$ (反向累计），默认值False，不累计展示；当 $参数 c u m u l a t i v e$ 为True时，直方图的计算方式为：每个柱子的高度等于落入该区间的数值计数与所有小于该区间数值的计数之和。最后一箱显示所有数据值的总个数。如果 $参数 d e n s i t y$ 同时为真，直方图是被归一化的，最后一箱的高度为1。如果累计值是一个小于0的数字，累计是反向的，若 $参数 d e n s i t y$ 同时为真，则直方图是被归一化的，最后一箱的高度值为-1。当 $参数 d e n s i t y$ 为False， $参数 c u m u l a t i v e$ 为True

当 $参数 d e n s i t y$ 为False， $参数 c u m u l a t i v e$ 为-1（反向累计）

当 $参数 d e n s i t y$ 为True， $参数 c u m u l a t i v e$ 为True

bottom： array-like, scalar, or None, default: None Location of the bottom of each bin, ie. bins are drawn from bottom to bottom + hist(x, bins) If a scalar, the bottom of each bin is shifted by the same amount. If an array, each bin is shifted independently and the length of bottom must match the number of bins. If None, defaults to 0.

底部：设置各箱的底部值，是类数组，标量或者None，默认参数为None时取值为0 设置各箱的底部的位置，即：箱子的绘制区间是参数bottom到bottom+该箱子的对应高度值。若 $参数 b o t t o m$ 为标量，则每个箱子平移相同大小的量；若 $参数 b o t t o m$ 为数组，则每箱分别移动对应的高度， $参数 b o t t o m$ 的长度需与箱的数目一致。若 $参数 b o t t o m$ 为空，则默认平移0。当 $参数 b o t t o m$ 为空，默认平移0

当 $参数 b o t t o m = 3$ ，每箱的数值向上平移3，例如: 第一箱从 $0 - 8$ 平移到 $3 - 11$

当 $参数 b o t t o m$ 为数组时，每箱分别移动对应的高度,例如: 第一箱从 $0 - 8$ 平移到 $2 - 10$ ;第一箱从 $0 - 10$ 平移到 $3 - 13$ histtype: {‘bar’, ‘barstacked’, ‘step’, ‘stepfilled’}, default: ‘bar’ The type of histogram to draw. ’bar’ is a traditional bar-type histogram. If multiple data are given the bars are arranged side by side. ’barstacked’ is a bar-type histogram where multiple data are stacked on top of each other. ’step’ generates a lineplot that is by default unfilled. ’stepfilled’ generates a lineplot that is by default filled. 绘制的直方图类型：可取$ {‘bar’, ‘barstacked’, ‘step’, ‘stepfilled’}$ 中的一个，默认值是 $b a r$ bar：是传统条形直方图，若给出多个数据，则条形图并排排列 barstacked：条形直方图，多组数据互相堆叠（即使参数stacked为False） step：产生默认未填充的线图 stepfilled：产生默认填充的线图当 $参数 h i s t t y p e = s t e p$ ，绘图如下：当 $参数 h i s t t y p e = s t e p f i l l e d$ ，绘图如下：

align: {‘left’, ‘mid’, ‘right’}, default: ‘mid’. The horizontal alignment of the histogram bars. ‘left’: bars are centered on the left bin edges. ‘mid’: bars are centered between the bin edges. ‘right’: bars are centered on the right bin edges. 对齐方式：可以选择 {‘left’, ‘mid’, ‘right’}中的一个 ,默认居中对齐，直方图条的水平对齐方式左对齐：柱子的中间与每一箱bin的左边界重合居中对齐：柱子的中间与每一箱bin的中间重合右对齐：柱子的中间与每一箱bin的右边界重合

左对齐：如下图，每根柱子的中间位置，分别与每一箱bin的左边界重合，如：第一箱的柱子中间位置在第一箱的左边界0处，与0重合。

居中对齐：如下图，每根柱子的中间位置，分别与每一箱bin的中间位置重合，即：每根柱子刚好位于每一箱的左右边界之间。

右对齐：如下图，每根柱子的中间位置，分别与每一箱bin的右边界重合，如：第一箱的柱子中间位置在第一箱的右边界数值1处，与1重合。

orientation： {‘vertical’, ‘horizontal’}, default: ‘vertical’ If ‘horizontal’, barh will be used for bar-type histograms and the bottom kwarg will be the left edges. 方向：可取水平、垂直中的一个，默认方向为垂直。若 $o r i e n t a t i o n =^{'} h o r i z o n t a l^{'}$ ,直方图的柱子类型将会使用条形图，底部参数变成左边缘平移相应大小。

$o r i e n t a t i o n =^{'} h o r i z o n t a l^{'}$ 时，方向为水平， $参数 b o t t o m = 2$ ，每箱的数值向右平移2，例如: 第一箱从 $0 - 8$ 平移到 $2 - 10$ rwidth： float or None, default: None The relative width of the bars as a fraction of the bin width. If None, automatically compute the width. Ignored if histtype is ‘step’ or ‘stepfilled’. 浮点型或者为空,默认为None 每根柱子的宽度占每箱宽度的比例，若为空，则自动计算宽度；若直方图柱子类型为 $^{'} s t e p^{'} o r^{'} s t e p f i l l e d^{'} .$ ，则定义参数 $r w i d t h$ 无效，柱子之间是连在一起的。

当柱子类型 $h i s t t y p e = s t e p f i l l e d 或者 s t e p$ ，设置 $参数 r w i d t h$ 对结果无影响

当 $h i s t t y p e = b a r$ ,设置 $参数 r w i d t h = 0.8$ ，每根柱子宽度变为对应每箱宽度的0.8倍 log： bool, default: False If True, the histogram axis will be set to a log scale. If log is True and x is a 1D array, empty bins will be filtered out and only the non-empty (n, bins, patches) will be returned. 对数：布尔型，默认值为False 若参数 $l o g = T r u e$ ，柱形图的反应每箱高度的轴转换成对数刻度。若参数 $l o g = T r u e$ 同时x为一维数组，空箱将被剔除，仅返回非空箱（不理解剔除表现在哪里，未取对数的时候空箱的对应值也为空，难道是说轴对应的最小值没有到负无穷？属于剔除了吗？）

默认参数 $l o g = T r u e$ 时取完对数，纵轴并为按照预想的变成以10为底的对数，数值大小没变，只是坐标起始值变化了 color：color or array-like of colors or None, default: None Color or sequence of colors, one per dataset. Default (None) uses the standard line color sequence. 一种颜色或一系列颜色（每个数据集只能一个颜色，），也可以是None。默认是None空，使用标准线条颜色序列。通过参数color设置柱子颜色

当给同一个x数据集，多个颜色，会报错，并不能设置直方图的每个柱子一个颜色

label： str or None, default: None String, or sequence of strings to match multiple datasets. Bar charts yield multiple patches per dataset, but only the first gets the label, so that legend will work as expected. 标签：字符串或者None，默认值为None。字符串，或者匹配多组数据的系列字符串。每个数据集会产生多个补丁，但只有第一个补丁会获得标签，因此图例(legend)将起作用。

一组数据对于同时绘制两组及以上数据的直方图，还没有在官方文档里看到，试着做了一下，给标签如下：

import matplotlib.pyplot as plt import numpy as np x = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9] y = np.random.randint(0, 10, 100) b = list(range(0, 11, 1)) # bins为序列 # plt.hist(x, bins=b, histtype='bar',label='score') plt.hist((x, y), bins=b, histtype='bar', color=('green', 'y'), label=('score', 'test')) plt.legend() plt.show()

stacked： bool, default: False If True, multiple data are stacked on top of each other If False multiple data are arranged side by side if histtype is ‘bar’ or on top of each other if histtype is ‘step’ 是否叠放：布尔值，默认值False，不叠放。若参数 $s t a c k e d = T r u e$ ，多个数据互相叠加(无论柱子类型为何种样式)；若参数 $s t a c k e d = F a l s e$ ，则在参数 $h i s t t y p e =^{'} b a r^{'}$ 的情况下将多组数据并排排列，当 $h i s t t y p e =^{'} s t e p^{'}$ 的情况下多组数据在同一区间（另一个数据上但并不是叠加）展示

当参数 $s t a c k e d = T r u e$ ，参数 $h i s t t y p e =^{'} b a r^{'}$

当参数 $s t a c k e d = T r u e$ ，参数 $h i s t t y p e =^{'} b a r s t a c k e d^{'}$ 当参数 $s t a c k e d = T r u e$ ，参数 $h i s t t y p e =^{'} s t e p^{'}$ 当参数 $s t a c k e d = T r u e$ ，参数 $h i s t t y p e =^{'} s t e p f i l l e d^{'}$

当参数 $s t a c k e d = F a l s e$ ，参数 $h i s t t y p e =^{'} s t e p^{'}$ ，多组数据在同一区间（另一个数据上但并不是叠加）展示

当参数 $s t a c k e d = F a l s e$ ，则在参数 $h i s t t y p e =^{'} b a r^{'}$ 的情况下将多组数据并排排列当参数 $s t a c k e d = F a l s e$ ，则在参数 $h i s t t y p e =^{'} b a r s t a c k e d^{'}$ 的情况下多组数据依然相互叠加，如0的个数，图中显示13个，正式x中的8个与y中的5个之和。当参数 $s t a c k e d = F a l s e$ ，则在参数 $h i s t t y p e =^{'} s t e p f i l l e d^{'}$ 时，展示结果与参数 $s t a c k e d = F a l s e$ ，则在参数 $h i s t t y p e =^{'} s t e p^{'}$ 展示一致，在同一区间绘制，但数据并不叠加，如：0的数量分别为8与5，分别绘制展示，并不在8的基础上加5。

import matplotlib.pyplot as plt import numpy as np x = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9] y = [0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 7, 7, 7, 8, 8, 9, 9, 9] b = list(range(0, 11, 1)) # bins为序列 # plt.hist(x, bins=b, histtype='bar',label='score') plt.hist((x, y), bins=b, histtype='stepfilled', color=('purple', 'y'), alpha=0.1, label=('score', 'test'), stacked=False) plt.legend() plt.show()

Other Parameters: **kwargs ，Patch properties 其他参数：关键字传参，参见Patch属性属性列表如下：

import matplotlib.pyplot as plt x = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9] y = [0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 7, 7, 7, 8, 8, 9, 9, 9] b = list(range(0, 11, 1)) # bins为序列 plt.hist(x, bins=b, histtype='bar', label='score', linewidth=1, edgecolor='k', alpha=0.6, linestyle=":") plt.legend() plt.show()

即使图中有多组数据，有些属性却并不能分开设置，只能有一个值，例如linewidth值只能是一个float,输入两个数字组成的元组会报错

参考pyplot.hist官方文档:https://matplotlib.org/api/_as_gen/matplotlib.pyplot.hist.html

Processed: 0.033, SQL: 9