MySQL压测时Linux中断异常飚高，原来是因为...

科技2025-02-05 71

1. 系统环境

OS: CentOS Linux release 7.8.2003 (Core)

Kernel: 3.10.0-1127.19.1.el7.x86_64

MySQL: 用5.0、5.7均有此问题，应该和版本无关

2. 压测工具

benchyou[1]

mysql_random_load[2]

3. 问题现象

利用 mysql_random_load 工具连接MySQL写入数据时，性能非常非常低。

由于 mysql_random_load 工具不支持通过socket连接，只好放弃，改用benchyou。顺便说一下，benchyou和sysbench极为相似，也非常好用。

改用benchyou工具后，压测正常。看来的确不是MySQL版本的问题。

用 mysql_random_load 工具进行压测时，系统负载非常高，同时可观测到系统的中断也很高并且也很不均衡。

[root@yejr.run]# vmstat -S m 1 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 0 73585 2 41051 0 0 117 91 4 2 0 0 99 0 0 2 0 0 73585 2 41051 0 0 0 28320 55444 100207 18 2 80 0 0 4 0 0 73584 2 41052 0 0 0 1936 52949 98607 18 2 81 0 0 2 0 0 73583 2 41052 0 0 0 4864 56375 101262 14 2 84 0 0 4 0 0 73583 2 41052 0 0 0 29064 55806 103715 19 2 80 0 0 5 0 0 73583 2 41052 0 0 0 5704 55854 98386 15 2 83 0 0

可以看到 system.in 这列的值非常高，改成benchyou工具后，这列的值从5.5万降到1.6万。

[root@yejr.run]# vmstat -S m 1 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 4 0 0 77238 2 38371 0 0 118 88 2 3 0 0 99 0 0 2 0 0 77234 2 38374 0 0 0 31620 16039 77988 3 2 95 0 0 2 0 0 77231 2 38377 0 0 0 31996 16091 78926 3 2 95 0 0 3 0 0 77229 2 38378 0 0 0 33028 16347 81006 3 2 95 0 0 0 0 0 77226 2 38383 0 0 0 52412 15496 75715 3 2 95 0 0 2 0 0 77224 2 38384 0 0 0 32252 16167 79352 3 2 95 0 0

再看下有问题时的系统中断表现

[root@yejr.run]# mpstat -I SUM -P ALL 1 Linux 3.10.0-1127.19.1.el7.x86_64 (yejr.run) 09/28/2020 _x86_64_ (32 CPU) 05:37:40 PM CPU intr/s 05:37:41 PM all 51833.00 05:37:41 PM 0 2069.00 05:37:41 PM 1 1159.00 05:37:41 PM 2 2979.00 05:37:41 PM 3 1580.00 05:37:41 PM 4 1627.00 05:37:41 PM 5 1461.00 05:37:41 PM 6 1243.00 05:37:41 PM 7 1825.00 05:37:41 PM 8 2154.00 05:37:41 PM 9 1367.00 05:37:41 PM 10 1277.00 05:37:41 PM 11 1376.00 05:37:41 PM 12 4085.00 05:37:41 PM 13 1601.00 05:37:41 PM 14 4045.00 05:37:41 PM 15 1857.00 05:37:41 PM 16 1692.00 05:37:41 PM 17 722.00 05:37:41 PM 18 118.00 05:37:41 PM 19 1862.00 05:37:41 PM 20 1637.00 05:37:41 PM 21 1130.00 05:37:41 PM 22 1750.00 05:37:41 PM 23 1653.00 05:37:41 PM 24 1417.00 05:37:41 PM 25 1547.00 05:37:41 PM 26 1500.00 05:37:41 PM 27 1033.00 05:37:41 PM 28 20.00 05:37:41 PM 29 1683.00 05:37:41 PM 30 888.00 05:37:41 PM 31 1549.00

可以看到每秒中断总量有5.5万，但多个CPU间并不均衡。

4. 问题分析

初步认定是因为系统中断太高导致的写入性能差，并且也认定是因为多个CPU间中断不均衡导致的这个问题。

观察是都有哪些中断比较高，发现主要是 LOC 和 RES 这两个每秒的增长比较大。

[root@yejr.run]# watch -d cat /proc/interrupts ... LOC: 2468939840 2374791518 2373834803 2373613050 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 Performance monitoring interrupts IWI: 50073298 45861632 45568755 45833911 IRQ work interrupts RTR: 0 0 0 0 APIC ICR read retries RES: 3472920231 3022439316 2990464825 3012790828 Rescheduling interrupts CAL: 5131479 6539715 17285454 11211131 Function call interrupts TLB: 23094853 24045725 24230472 24271286 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts ...

在尝试修改相关中断号绑定的CPU后（参考：SMP affinity and proper interrupt handling in Linux[3]），问题还是没有得到缓解。

后来某神秘大佬给指点了下，才发现原来是个内核的bug，涉及到参数 kernel.timer_migration，需要将其设置为 0 才行。

[root@yejr.run]# sysctl -w kernel.timer_migration=0

当然了，最好持久化写入到 /etc/sysctl.conf 文件中。

[root@yejr.run]# cat /etc/sysctl.conf kernel.timer_migration=0 #加载配置文件使之生效 [root@yejr.run]# sysctl -p

再次用 mysql_random_load 工具进行压测就没事了。

下面是关于该bug的描述

The bug is when linux os receive too many tcp packages, and the tcp may add too many timer, but in get_target_base->get_nohz_timer_target it check current cpu is idle, sometimes thouth the current core is very busy, but the idle_cpu result is 1, in this time if set kernel.timer_migration=1 ,the timer will be move to next cpu.

bug详情见：Bug 124661 - kernel.timer_migration=1 cause too many Rescheduling interrupts[4]

最后，值得一提的是，在云主机上修改该参数应该是不管用，除非修改物理机的。我在某云主机上运行 mysql_random_load 工具压测也遇到类似问题，修改内核参数后问题依旧。

文内链接

[1]:https://github.com/xelabs/benchyou

[2]:https://github.com/Percona-Lab/mysql_random_data_load

[3]:http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux

[4]:https://bugzilla.kernel.org/show_bug.cgi?id=124661

全文完。

Enjoy Linux & MySQL :)

「MySQL核心优化」课已升级到MySQL 8.0，扫码开启MySQL修行之旅

Processed: 0.016, SQL: 9