openGauss

开源数据库

openGauss社区官网

开源社区

博客MogDB AI 特性系列1_X-Tuner

MogDB AI 特性系列1_X-Tuner

李宏达2022-04-11MogDB AI 特性系列1_X-Tuner

MogDB AI 特性系列 1_X-Tuner

一、概述

X-Tuner 是一款数据库集成的参数调优工具，通过结合深度强化学习和全局搜索算法等 AI 技术，实现在无需人工干预的情况下，获取最佳数据库参数配置。本功能不强制与数据库环境部署到一起，支持独立部署，脱离数据库安装环境独立运行。

二、使用准备

前提条件与使用事项

数据库状态正常、客户端能够正常连接、且要求数据库内导入数据，以便调优程序可以执行 benchmark 测试调优效果。
使用本工具需要指定登录到数据库的用户身份，要求该登录到数据库上的用户具有足够的权限，以便可以获得充足的数据库状态信息。
使用登录到数据库宿主机上的 Linux 用户，需要将$GAUSSHOME/bin 添加到 PATH 环境变量中，即能够直接运行 gsql、gs_guc、gs_ctl 等数据库运维工具。
Python 版本建议为 Python3.6 及以上，且运行环境中已经安装相应依赖，并能够正常启动调优程序。您可以独立安装一个 python3.6+的环境，无需设置到全局环境变量中。不建议使用 root 用户权限安装本工具，如果以 root 身份安装本完毕工具，使用其他用户身份运行本工具时，需要确保配置文件有读取权限。
本工具支持以三种模式运行，其中 tune 和 train 模式要求用户配置好 benchmark 运行环境，并导入数据，本工具将会通过迭代运行 benchmark 来判断修改后的参数是否有性能提升。
recommend 模式建议在数据库正在执行 workload 的过程中执行，以便获得更准确的实时 workload 信息。
本工具默认带有 TPC-C、TPC-H、TPC-DS 以及 sysbench 的 benchmark 运行脚本样例，如果用户使用上述 benchmark 对数据库系统进行压力测试，则可以对上述配置文件进行适度修改或配置。如果需要适配用户自己的业务场景，需要您参照 benchmark 目录中的 template.py 文件编写驱动您自定义 benchmark 的脚本文件。

三、原理简介

调优程序是一个独立于数据库内核之外的工具，需要提供数据库及其所在实例的用户名和登录密码信息，以便控制数据库执行 benchmark 进行性能测试；在启动调优程序前，要求用户测试环境交互正常，能够正常跑通 benchmark 测试脚本、能够正常连接数据库。

X-Tuner 运行模式

recommend: 通过用户指定的用户名等信息登录到数据库环境中，获取当前正在运行的 workload 特征信息，根据上述特征信息生成参数推荐报告。报告当前数据库中不合理的参数配置和潜在风险等；输出根据当前正在运行的 workload 行为和特征；输出推荐的参数配置。该模式是秒级的，不涉及数据库的重启操作，其他模式可能需要反复重启数据库。
train: 通过用户提供的 benchmark 信息，不断地进行参数修改和 benchmark 的执行。通过反复的迭代过程，训练强化学习模型，以便用户在后面通过 tune 模式加载该模型进行调优。
tune: 使用优化算法进行数据库参数的调优，当前支持两大类算法，一种是深度强化学习，另一种是全局搜索算法（全局优化算法）。深度强化学习模式要求先运行 train 模式，生成训练后的调优模型，而使用全局搜索算法则不需要提前进行训练，可以直接进行搜索调优。

X-Tuner 架构图

DB 侧：通过 DB_Agent 模块对数据库实例进行抽象，通过该模块可以获取数据库内部的状态信息、当前数据库参数、以及设置数据库参数等。DB 侧包括登录数据库环境使用的 SSH 连接。
算法侧：用于调优的算法包，包括全局搜索算法（如贝叶斯优化、粒子群算法等）和深度强化学习（如 DDPG）；
X-Tuner 主体逻辑模块：通过 Enviroment 模块进行封装，每一个 step 就是一次调优过程。整个调优过程通过多个 step 进行迭代；
benchmark: 由用户指定的 benchmark 性能测试脚本，用于运行 benchmark 作业，通过跑分结果反映数据库系统性能优劣。

四、X-Tuner 安装

1. 程序位置

源码位置

[root@ecs-saving-0008 xtuner]# pwd /root/openGauss-server/src/gausskernel/dbmind/tools/xtuner

已安装过的数据库

[omm@ecs-saving-0001 ~]$ cd $GAUSSHOME/bin/dbmind/xtuner [omm@ecs-saving-0001 xtuner]$ pwd /opengauss/app/1.0.1/bin/dbmind/xtuner

2. 安装依赖

[root@ecs-saving-0008 xtuner]# ls
build  dist  openGauss_xtuner.egg-info  Readme.md  requirements.txt  setup.py  share  test  tuner
[root@ecs-saving-0008 xtuner]# pip3 install --upgrade pip
[root@ecs-saving-0008 xtuner]# pip3 install -r requirements.txt
[root@ecs-saving-0008 xtuner]# cat requirements.txt
tensorflow>=2.2.0
keras-rl2
paramiko
bayesian-optimization
ptable
[root@ecs-saving-0008 xtuner]# pip3 install keras

默认网站较慢建议学会科学上网，手动解决依赖。 Python Package Index

3. 安装 X-Tuner

[root@ecs-saving-0008 xtuner]# pwd
/root/openGauss-server/src/gausskernel/dbmind/tools/xtuner
[root@ecs-saving-0008 xtuner]# python3 setup.py install
[root@ecs-saving-0008 xtuner]# gs_xtuner --help
usage: gs_xtuner [-h] [--db-name DB_NAME] [--db-user DB_USER] [--port PORT]
                 [--host HOST] [--host-user HOST_USER]
                 [--host-ssh-port HOST_SSH_PORT] [-f DB_CONFIG_FILE]
                 [-x TUNER_CONFIG_FILE] [-v]
                 {train,tune,recommend}

X-Tuner: a self-tuning tool integrated by openGauss.

positional arguments:
  {train,tune,recommend}
                        Train a reinforcement learning model or tune database
                        by model. And also can recommend best_knobs according
                        to your workload.

optional arguments:
  -h, --help            show this help message and exit
  -f DB_CONFIG_FILE, --db-config-file DB_CONFIG_FILE
                        You can pass a path of configuration file otherwise
                        you should enter database information by command
                        arguments manually. Please see the template file
                        share/server.json.template.
  -x TUNER_CONFIG_FILE, --tuner-config-file TUNER_CONFIG_FILE
                        This is the path of the core configuration file of the
                        X-Tuner. You can specify the path of the new
                        configuration file. The default path is
                        /usr/local/lib/python3.6/site-packages/openGauss_xtune
                        r-2.0.0-py3.6.egg/tuner/xtuner.conf. You can modify
                        the configuration file to control the tuning process.
  -v, --version         show program's version number and exit

Database Connection Information:
  --db-name DB_NAME     The name of database where your workload running on.
  --db-user DB_USER     Use this user to login your database. Note that the
                        user must have sufficient permissions.
  --port PORT           Use this port to connect with the database.
  --host HOST           The IP address of your database installation host.
  --host-user HOST_USER
                        The login user of your database installation host.
  --host-ssh-port HOST_SSH_PORT
                        The SSH port of your database installation host.

五、文件解读

[root@ecs-saving-0008 tools]# pwd
/root/openGauss-server/src/gausskernel/dbmind/tools
[root@ecs-saving-0008 tools]# ls
anomaly_detection  index_advisor  predictor  sqldiag  xtuner

分别为五个 AI 功能，后续文章会逐一介绍。

Anomaly-detection：数据库指标采集、预测与异常监控
Index-advisor：索引推荐
Predictor: AI 查询时间预测
SQLdiag: 慢 SQL 发现
X-Tuner: 参数调优与诊断

[root@ecs-saving-0008 benchmark]# pwd /root/openGauss-server/src/gausskernel/dbmind/tools/xtuner/tuner/benchmark __init__.py  Readme.md  sysbench.py  template.py  tpcc.py  tpcds.py  tpch.py

需提前调试好相关程序并生成测试数据

benchmark 模拟压力模型，支持 sysbench.py template.py tpcc.py tpcds.py tpch.py
在使用 tune 和 train 模式前，用户需要先导入 benchmark 所需数据并检查 benchmark 能否正常跑通，并备份好此时的数据库参数，查询当前数据库参数的方法为：select name, setting from pg_settings;
目前 tpcc 测试程序相关代码有问题，暂时手动模拟压力解决。

配置文件解读

[root@ecs-saving-0008 tuner]# cat xtuner.conf |grep -v ^$|grep -v ^#
[Master]
logfile = log/opengauss_tuner.log
output_tuning_result = tuned_knobs.json
verbose = on
recorder_file = log/recorder.log
tune_strategy = auto  # rl, gop or auto
drop_cache = on  # You must modify the permission of the login user in the /etc/sudoers file and grant the NOPASSWD permission to the user.
used_mem_penalty_term = 1e-9  # Prevent taking up more memory.
[Reinforcement Learning]
rl_algorithm = ddpg  # ddpg, dqn. Not support dqn yet.
rl_model_path = model/rl.model
rl_steps = 100
max_episode_steps = 10
test_episode = 1
[Gloabal Optimization Algorithm]
gop_algorithm = bayes  # bayes, pso
max_iterations = 100
particle_nums = 3  # A larger value indicates higher accuracy but slower speed.
[Benchmark]
benchmark_script = tpcc
benchmark_path = '/opt/benchmarksql-5.0/run'  # If this parameter is blank, the default path in the benchmark script is used.
benchmark_cmd = "./runBenchmark.sh props.og"  # If this parameter is blank, the default cmd in the benchmark script is used.
[Knobs]
scenario = auto  # ap, tp, htap or auto
tuning_list =  # template: share/knobs.json.template

rl_algorithm：用于训练强化学习模型的算法，当前支持设置为 ddpg.
rl_model_path: 训练后生成的强化学习模型保存路径。
rl_steps：训练过程的最大迭代步数。
max_episode_steps：每个回合的最大步数。
scenario: 明确指定的 workload 类型，如果为 auto 则为自动判断。在不同模式下，推荐的调优参数列表也不一样。
tuning_list: 用户指定需要调哪些参数，如果不指定，则根据 workload 类型自动推荐应该调的参数列表。
tune_strategy: 指定选择哪种算法进行调优，支持 rl（使用强化学习模型进行调优）、gop （使用全局搜索算法）以及 aut- （自动选择）。若该参数设置为 rl，则 rl 相关的配置项生效。除前文提到过的 train 模式下生效的配置项外，test_episode 配置项- 生效，该配置项表明调优过程的最大回合数，该参数直接影响了调优过程的执行时间（一般地，数值越大越耗时）。
gop_algorithm: 选择何种全局搜索算法，支持 bayes 以及 pso.
max_iterations: 最大迭代轮次，数值越高搜索时间越长，效果往往越好。
particle_nums: 在 PSO 算法上生效，表示粒子数。
scenario 与 tuning_list 见上文 train 模式中的描述。
drop_cache = on 需修改/etc/sudoers 文件添加如下 username ALL=(ALL) NOPASSWD: ALL

六、使用示例

X-Tuner 支持三种模式，分别是获取参数诊断报告的 recommend 模式、训练强化学习模型的 train 模式、以及使用算法进行调优的 tune 模式。上述三种模式可以通过命令行参数来区别，通过配置文件来指定具体的细节。

1. 配置数据库连接信息

有两种方式

a. 通过命令行执行

 [root@ecs-saving-0008 xtuner]# gs_xtuner recommend --db-name test1 --db-user test1 --port 26000 --host 192.168.1.XXX --host-user omm -x xtuner.conf

b. json 配置文件

[root@ecs-saving-0008 tuner]# cat connection.json  {  "db_name": "test1",  "db_user": "test1",  "host": "192.168.1.xxx",  "host_user": "omm",  "port": 26000,  "ssh_port": 22 } [root@ecs-saving-0008 xtuner]# gs_xtuner recommend -f connection.json -x xtuner.conf


[root@ecs-saving-0008 tuner]# gs_xtuner recommend --db-name test1 --db-user test1 --port 26000 --host 192.168.1.xxx --host-user omm
Please input the password of database:
Please input the password of host:
Start to recommend knobs. Just a moment, please.
******************************************************* Knob Recommendation Report *******************************************************
INFO:
+---------------------------------------+-----------------------+
|                 Metric                |         Value         |
+---------------------------------------+-----------------------+
|             workload_type             |           tp          |
|         dirty_background_bytes        |           0           |
|             temp_file_size            |           0           |
|          current_locks_count          |          0.0          |
|      current_prepared_xacts_count     |          0.0          |
|         average_connection_age        |        0.013457       |
| checkpoint_proactive_triggering_ratio |   0.0413987138263666  |
|         rollback_commit_ratio         |  0.07913229312035018  |
|         fetched_returned_ratio        |   0.3706085439318767  |
|             cache_hit_rate            |   0.9780311156824839  |
|          current_connections          |          1.0          |
|                 uptime                |    1.48835260083333   |
|          search_modify_ratio          |   119.88027537957866  |
|           all_database_size           |    1436801.01171875   |
|            current_free_mem           |        15400044       |
|              os_mem_total             |        16430884       |
|        checkpoint_avg_sync_time       |    2.03215434083601   |
|            read_write_ratio           |   2.1202316810075224  |
|                ap_index               |   2.1987815358313485  |
|             max_processes             |          3010         |
|          track_activity_size          |         3010.0        |
|                used_mem               |      3155124224.0     |
|            write_tup_speed            |    374.211175655672   |
|              os_cpu_count             |           4           |
|  checkpoint_dirty_writing_time_window |         720.0         |
|             read_tup_speed            |    793.422421264566   |
|               block_size              |          8.0          |
|      shared_buffer_toast_hit_rate     |    91.9572192513369   |
|      shared_buffer_heap_hit_rate      |    99.5745962509106   |
|       shared_buffer_idx_hit_rate      |   99.67849101944427   |
|      shared_buffer_tidx_hit_rate      |   99.78536585365853   |
|                 is_hdd                |         False         |
|           enable_autovacuum           |          True         |
|                is_64bit               |          True         |
|              load_average             | [0.26, 0.5375, 0.525] |
+---------------------------------------+-----------------------+
p.s: The unit of storage is kB.
WARN:
[0]. The total size of all databases is less than the memory size. Therefore, it is unnecessary to set shared_buffers to a large value.
[1]. The database runs for a short period of time, and the database description may not be accumulated. The recommendation result may be inaccurate.
[2]. The number of CPU cores is a little small. Please do not run too high concurrency. You are recommended to set max_connections based on the number of CPU cores. If your job does not consume much CPU, you can also increase it.
BAD:
[0]. The value of wal_buffers is too high. Generally, a large value does not bring better performance.
******************************************************** Recommended Knob Settings ********************************************************
+---------------------------+-----------+--------+----------+---------+
|            name           | recommend |  min   |   max    | restart |
+---------------------------+-----------+--------+----------+---------+
|       shared_buffers      |   179600  | 179600 |  206540  |   True  |
|      max_connections      |    100    |   20   |   500    |   True  |
|    effective_cache_size   |   179600  | 179600 | 12323163 |  False  |
|  effective_io_concurrency |    200    |  150   |   250    |  False  |
|        wal_buffers        |    5612   |  2048  |   5612   |   True  |
|      random_page_cost     |    1.0    |  1.0   |   2.0    |  False  |
| default_statistics_target |    100    |   10   |   150    |  False  |
+---------------------------+-----------+--------+----------+---------+

在上述报告中，推荐了该环境上的数据库参数配置，并进行了风险提示。报告同时生成了当前 workload 的特征信息，其中有几个特征是比较有参考意义的：

temp_file_size：产生的临时文件数量，如果该结果大于 0，则表明系统使用了临时文件。使用过多的临时文件会导致性能不佳，如果可能的话，需要提高 work_mem 参数的配置。
cache_hit_rate：shared_buffer 的缓存命中率，表明当前 workload 使用缓存的效率。
read_write_ratio：数据库作业的读写比例。
search_modify_ratio：数据库作业的查询与修改数据的比例。
ap_index：表明当前 workload 的 AP 指数，取值范围是 0 到 10，该数值越大，表明越偏向于数据分析与检索。
workload_type：根据数据库统计信息，推测当前负载类型，分为 AP、TP 以及 HTAP 三种类型。
checkpoint_avg_sync_time：数据库在 checkpoint 时，平均每次同步刷新数据到磁盘的时长，单位是毫秒。
load_average：平均每个 CPU 核心在 1 分钟、5 分钟以及 15 分钟内的负载。一般地，该数值在 1 左右表明当前硬件比较匹配 workload、在 3 左右表明运行当前作业压力比较大，大于 5 则表示当前硬件环境运行该 workload 压力过大（此时一般建议减少负载或升级硬件）。
- recommend 模式会读取数据库中的 pg_stat_database 以及 pg_stat_bgwriter 等系统表中的信息，需要登录到数据库上的用户具有足够的权限（建议为管理员权限，可通过 alter user username sysadmin；授予 username 相应的权限）。
- 由于某些系统表会一直记录统计信息，这可能会对负载特征识别造成干扰，因此建议最好先清空某些系统表的统计信息，运行一段时间的 workload 后再使用 recommend 模式进行诊断，以便获得更准确的结果。清除统计信息的方法为：
- select pg_stat_reset_shared(‘bgwriter’);
- select pg_stat_reset();

3. train 模式

[root@ecs-saving-0008 tuner]# gs_xtuner train  -f connection.json -x xtuner.conf
Please input the password of database:
Please input the password of host:
Start to recommend knobs. Just a moment, please.
WARN: The database may restart several times during tuning, continue or not [yes|no]:yes
2021-01-17 15:48:20,864: Recorder is starting.
2021-01-17 15:48:21.423844: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-01-17 15:48:21.423910: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Adam
2021-01-17 15:48:24.022924: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-17 15:48:24.023181: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-01-17 15:48:24.023212: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-01-17 15:48:24.023246: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ecs-saving-0008): /proc/driver/nvidia/version does not exist
2021-01-17 15:48:24.023773: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX512F
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-17 15:48:24.024250: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-17 15:48:24.052176: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-01-17 15:48:24.057462: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2600000000 Hz
The list of tuned knobs in the training mode based on the reinforcement learning algorithm must be the same as that in the tuning mode.
/usr/local/lib64/python3.6/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
2021-01-17 15:48:50,022: [0] Current reward is -3.256027, knobs: {'shared_buffers': '213807'}.
2021-01-17 15:48:50,022: [0] Best reward is -3.256027, knobs: {'shared_buffers': '213807'}.
2021-01-17 15:48:50,022: [1] Database metrics: [0.5051657050852006, 0.975755080057054, 0.2125].
2021-01-17 15:48:50,023: [1] Benchmark score: -0.103773, used mem: 3152253680 kB, reward: -3.256027.
2021-01-17 15:49:15,193: [1] Current reward is -3.255776, knobs: {'shared_buffers': '228559'}.
2021-01-17 15:49:15,194: [1] Best reward is -3.255776, knobs: {'shared_buffers': '228559'}.
2021-01-17 15:49:15,194: [2] Database metrics: [1.0, 0.9757370052274821, 0.18].
2021-01-17 15:49:15,194: [2] Benchmark score: -0.103286, used mem: 3152489712 kB, reward: -3.255776.
2021-01-17 15:49:40,384: [2] Current reward is -3.256460, knobs: {'shared_buffers': '213646'}.
2021-01-17 15:49:40,384: [2] Best reward is -3.255776, knobs: {'shared_buffers': '228559'}.
2021-01-17 15:49:40,385: [3] Database metrics: [0.49976519522339996, 0.9757370052274821, 0.12].
2021-01-17 15:49:40,385: [3] Benchmark score: -0.104209, used mem: 3152251104 kB, reward: -3.256460.
2021-01-17 15:50:05,542: [3] Current reward is -3.256012, knobs: {'shared_buffers': '220718'}.
2021-01-17 15:50:05,542: [3] Best reward is -3.255776, knobs: {'shared_buffers': '228559'}.
2021-01-17 15:50:05,542: [4] Database metrics: [0.7369851066684556, 0.975755080057054, 0.1075].
2021-01-17 15:50:05,543: [4] Benchmark score: -0.103648, used mem: 3152364256 kB, reward: -3.256012.
2021-01-17 15:50:30,691: [4] Current reward is -3.256337, knobs: {'shared_buffers': '207214'}.
2021-01-17 15:50:30,691: [4] Best reward is -3.255776, knobs: {'shared_buffers': '228559'}.
2021-01-17 15:50:30,691: [5] Database metrics: [0.2840131490674896, 0.975755080057054, 0.0925].
2021-01-17 15:50:30,692: [5] Benchmark score: -0.104188, used mem: 3152148192 kB, reward: -3.256337.
****************************************************************************************************** Knob Recommendation Report ******************************************************************************************************
INFO:
+---------------------------------------+------------------------+
|                 Metric                |         Value          |
+---------------------------------------+------------------------+
|             workload_type             |           ap           |
|         dirty_background_bytes        |           0            |
|             temp_file_size            |           0            |
|          current_locks_count          |          0.0           |
|      current_prepared_xacts_count     |          0.0           |
|         rollback_commit_ratio         |          0.0           |
|            write_tup_speed            |          0.0           |
|         average_connection_age        |        0.012783        |
|                 uptime                |   0.101278143888889    |
|         fetched_returned_ratio        |  0.18264946102405455   |
| checkpoint_proactive_triggering_ratio |   0.195960870937204    |
|             cache_hit_rate            |   0.9871855024265209   |
|          current_connections          |          1.0           |
|            current_free_mem           |        15428064        |
|           all_database_size           |    1589977.01171875    |
|              os_mem_total             |        16430884        |
|        checkpoint_avg_sync_time       |    2.64405175134112    |
|          search_modify_ratio          |       20120500.0       |
|            read_write_ratio           |      201205000.0       |
|             max_processes             |          3010          |
|          track_activity_size          |         3010.0         |
|                used_mem               |      3152427664.0      |
|              os_cpu_count             |           4            |
|             read_tup_speed            |    419.743636679271    |
|                ap_index               |          6.7           |
|  checkpoint_dirty_writing_time_window |         720.0          |
|      shared_buffer_toast_hit_rate     |   76.40449438202248    |
|               block_size              |          8.0           |
|      shared_buffer_tidx_hit_rate      |    94.7103274559194    |
|      shared_buffer_heap_hit_rate      |   99.50838310887633    |
|       shared_buffer_idx_hit_rate      |   99.52046667292409    |
|                 is_hdd                |         False          |
|           enable_autovacuum           |          True          |
|                is_64bit               |          True          |
|              load_average             | [0.255, 0.225, 0.2125] |
+---------------------------------------+------------------------+
p.s: The unit of storage is kB.
WARN:
[0]. The number of CPU cores is a little small. Please do not run too high concurrency. You are recommended to set max_connections based on the number of CPU cores. If your job does not consume much CPU, you can also increase it.
[1]. The total size of all databases is less than the memory size. Therefore, it is unnecessary to set shared_buffers to a large value.
BAD:
[0]. The value of wal_buffers is too high. Generally, a large value does not bring better performance.
****************************************************************************************************** Recommended Knob Settings ******************************************************************************************************
+---------------------------+-----------+--------+----------+---------+
|            name           | recommend |  min   |   max    | restart |
+---------------------------+-----------+--------+----------+---------+
|       shared_buffers      |   198747  | 198747 |  228559  |   True  |
|      max_connections      |    185    |   15   |   370    |   True  |
|    effective_cache_size   |  12323163 | 198747 | 12323163 |  False  |
|  effective_io_concurrency |    200    |  150   |   250    |  False  |
|        wal_buffers        |    6210   |  2048  |   6210   |   True  |
|      random_page_cost     |    1.0    |  1.0   |   2.0    |  False  |
| default_statistics_target |    1000   |  100   |   1000   |  False  |
+---------------------------+-----------+--------+----------+---------+

训练完成后，会在配置项 rl_model_path 指定的目录中生成模型文件。

4. tune 模式

[root@ecs-saving-0008 tuner]# gs_xtuner tune  -f connection.json -x xtuner.conf
Please input the password of database:
Please input the password of host:
Start to recommend knobs. Just a moment, please.
WARN: The database may restart several times during tuning, continue or not [yes|no]:yes
2021-01-17 15:52:56,212: Recorder is starting.
|   iter    |  target   | effect... | random... |
-------------------------------------------------
2021-01-17 15:52:59,473: [0] Current reward is -3.255687, knobs: {'effective_io_concurrency': '172', 'random_page_cost': '1.34'}.
2021-01-17 15:52:59,473: [0] Best reward is -3.255687, knobs: {'effective_io_concurrency': '172', 'random_page_cost': '1.34'}.
2021-01-17 15:52:59,473: [1] Database metrics: [0.22, 0.3400000000000001, 0.9848556082330464, 0.2575].
2021-01-17 15:52:59,474: [1] Benchmark score: -0.103539, used mem: 3152148192 kB, reward: -3.255687.
|  1        | -3.256    |  0.223    |  0.3392   |
2021-01-17 15:53:01,612: [1] Current reward is -3.256332, knobs: {'effective_io_concurrency': '171', 'random_page_cost': '1.02'}.
2021-01-17 15:53:01,613: [1] Best reward is -3.255687, knobs: {'effective_io_concurrency': '172', 'random_page_cost': '1.34'}.
2021-01-17 15:53:01,614: [2] Database metrics: [0.21, 0.020000000000000018, 0.9851427720903616, 0.2575].
2021-01-17 15:53:01,614: [2] Benchmark score: -0.104184, used mem: 3152148192 kB, reward: -3.256332.
|  2        | -3.256    |  0.2139   |  0.01739  |
2021-01-17 15:53:03,664: [2] Current reward is -3.256156, knobs: {'effective_io_concurrency': '226', 'random_page_cost': '1.28'}.
2021-01-17 15:53:03,664: [2] Best reward is -3.255687, knobs: {'effective_io_concurrency': '172', 'random_page_cost': '1.34'}.
2021-01-17 15:53:03,665: [3] Database metrics: [0.76, 0.28, 0.9854192483562304, 0.2575].
2021-01-17 15:53:03,665: [3] Benchmark score: -0.104007, used mem: 3152148192 kB, reward: -3.256156.
|  3        | -3.256    |  0.7578   |  0.2831   |
2021-01-17 15:53:05,681: [3] Current reward is -3.256023, knobs: {'effective_io_concurrency': '200', 'random_page_cost': '1.35'}.
2021-01-17 15:53:05,682: [3] Best reward is -3.255687, knobs: {'effective_io_concurrency': '172', 'random_page_cost': '1.34'}.
2021-01-17 15:53:05,682: [4] Database metrics: [0.5, 0.3500000000000001, 0.9856856227825624, 0.2575].
2021-01-17 15:53:05,682: [4] Benchmark score: -0.103875, used mem: 3152148192 kB, reward: -3.256023.
|  4        | -3.256    |  0.5007   |  0.345    |
2021-01-17 15:53:07,728: [4] Current reward is -3.255765, knobs: {'effective_io_concurrency': '158', 'random_page_cost': '1.29'}.
2021-01-17 15:53:07,728: [4] Best reward is -3.255687, knobs: {'effective_io_concurrency': '172', 'random_page_cost': '1.34'}.
2021-01-17 15:53:07,728: [5] Database metrics: [0.08, 0.29000000000000004, 0.9859424390850301, 0.2575].
2021-01-17 15:53:07,729: [5] Benchmark score: -0.103617, used mem: 3152148192 kB, reward: -3.255765.
|  5        | -3.256    |  0.08186  |  0.2852   |
2021-01-17 15:53:09,892: [5] Current reward is -3.255949, knobs: {'effective_io_concurrency': '163', 'random_page_cost': '1.47'}.
2021-01-17 15:53:09,892: [5] Best reward is -3.255687, knobs: {'effective_io_concurrency': '172', 'random_page_cost': '1.34'}.
2021-01-17 15:53:09,892: [6] Database metrics: [0.13, 0.47, 0.9861902026474977, 0.275].
2021-01-17 15:53:09,892: [6] Benchmark score: -0.103801, used mem: 3152148192 kB, reward: -3.255949.
|  6        | -3.256    |  0.1274   |  0.4671   |
2021-01-17 15:53:12,043: [6] Current reward is -3.256464, knobs: {'effective_io_concurrency': '181', 'random_page_cost': '1.43'}.
2021-01-17 15:53:12,043: [6] Best reward is -3.255687, knobs: {'effective_io_concurrency': '172', 'random_page_cost': '1.34'}.
2021-01-17 15:53:12,043: [7] Database metrics: [0.31, 0.42999999999999994, 0.9864293838414936, 0.275].
2021-01-17 15:53:12,044: [7] Benchmark score: -0.104316, used mem: 3152148192 kB, reward: -3.256464.
|  7        | -3.256    |  0.3105   |  0.4275   |
2021-01-17 15:53:14,118: [7] Current reward is -3.256466, knobs: {'effective_io_concurrency': '164', 'random_page_cost': '1.31'}.
2021-01-17 15:53:14,118: [7] Best reward is -3.255687, knobs: {'effective_io_concurrency': '172', 'random_page_cost': '1.34'}.
2021-01-17 15:53:14,118: [8] Database metrics: [0.14, 0.31000000000000005, 0.9866604210066019, 0.295].
2021-01-17 15:53:14,119: [8] Benchmark score: -0.104318, used mem: 3152148192 kB, reward: -3.256466.
|  8        | -3.256    |  0.138    |  0.3143   |
2021-01-17 15:53:16,193: [8] Current reward is -3.255741, knobs: {'effective_io_concurrency': '185', 'random_page_cost': '1.79'}.
2021-01-17 15:53:16,193: [8] Best reward is -3.255687, knobs: {'effective_io_concurrency': '172', 'random_page_cost': '1.34'}.
2021-01-17 15:53:16,194: [9] Database metrics: [0.35, 0.79, 0.9868837231315075, 0.295].
2021-01-17 15:53:16,194: [9] Benchmark score: -0.103592, used mem: 3152148192 kB, reward: -3.255741.
|  9        | -3.256    |  0.3464   |  0.7892   |
2021-01-17 15:53:18,292: [9] Current reward is -3.255511, knobs: {'effective_io_concurrency': '155', 'random_page_cost': '1.83'}.
2021-01-17 15:53:18,293: [9] Best reward is -3.255511, knobs: {'effective_io_concurrency': '155', 'random_page_cost': '1.83'}.
2021-01-17 15:53:18,293: [10] Database metrics: [0.05, 0.8300000000000001, 0.9870996722701926, 0.295].
2021-01-17 15:53:18,293: [10] Benchmark score: -0.103362, used mem: 3152148192 kB, reward: -3.255511.
|  10       | -3.256    |  0.05099  |  0.8299   |
2021-01-17 15:53:20,388: [10] Current reward is -3.255719, knobs: {'effective_io_concurrency': '183', 'random_page_cost': '1.58'}.
2021-01-17 15:53:20,388: [10] Best reward is -3.255511, knobs: {'effective_io_concurrency': '155', 'random_page_cost': '1.83'}.
2021-01-17 15:53:20,389: [11] Database metrics: [0.33, 0.5800000000000001, 0.9873086257233148, 0.31].
2021-01-17 15:53:20,389: [11] Benchmark score: -0.103571, used mem: 3152148192 kB, reward: -3.255719.
|  11       | -3.256    |  0.332    |  0.5833   |
2021-01-17 15:53:22,476: [11] Current reward is -3.255922, knobs: {'effective_io_concurrency': '224', 'random_page_cost': '1.8'}.
2021-01-17 15:53:22,477: [11] Best reward is -3.255511, knobs: {'effective_io_concurrency': '155', 'random_page_cost': '1.83'}.
2021-01-17 15:53:22,478: [12] Database metrics: [0.74, 0.8, 0.9875109180109677, 0.31].
2021-01-17 15:53:22,478: [12] Benchmark score: -0.103774, used mem: 3152148192 kB, reward: -3.255922.
|  12       | -3.256    |  0.7379   |  0.7979   |
2021-01-17 15:53:24,562: [12] Current reward is -3.256171, knobs: {'effective_io_concurrency': '246', 'random_page_cost': '1.9'}.
2021-01-17 15:53:24,562: [12] Best reward is -3.255511, knobs: {'effective_io_concurrency': '155', 'random_page_cost': '1.83'}.
2021-01-17 15:53:24,562: [13] Database metrics: [0.96, 0.8999999999999999, 0.9877068626597331, 0.325].
2021-01-17 15:53:24,562: [13] Benchmark score: -0.104023, used mem: 3152148192 kB, reward: -3.256171.
|  13       | -3.256    |  0.9551   |  0.9002   |
2021-01-17 15:53:26,676: [13] Current reward is -3.255921, knobs: {'effective_io_concurrency': '199', 'random_page_cost': '1.35'}.
2021-01-17 15:53:26,677: [13] Best reward is -3.255511, knobs: {'effective_io_concurrency': '155', 'random_page_cost': '1.83'}.
2021-01-17 15:53:26,677: [14] Database metrics: [0.49, 0.3500000000000001, 0.9878967538241067, 0.325].
2021-01-17 15:53:26,677: [14] Benchmark score: -0.103773, used mem: 3152148192 kB, reward: -3.255921.
|  14       | -3.256    |  0.4858   |  0.3522   |
2021-01-17 15:53:28,763: [14] Current reward is -3.255747, knobs: {'effective_io_concurrency': '153', 'random_page_cost': '1.83'}.
2021-01-17 15:53:28,764: [14] Best reward is -3.255511, knobs: {'effective_io_concurrency': '155', 'random_page_cost': '1.83'}.
2021-01-17 15:53:28,764: [15] Database metrics: [0.03, 0.8300000000000001, 0.9880808677599346, 0.325].
2021-01-17 15:53:28,764: [15] Benchmark score: -0.103598, used mem: 3152148192 kB, reward: -3.255747.
|  15       | -3.256    |  0.02704  |  0.8322   |
=================================================
2021-01-17 15:53:28,765: The tuning process is complete. The best reward is -3.255511, best knobs are:
{'effective_io_concurrency': '155', 'random_page_cost': '1.83'}.
****************************************************************************************************** Knob Recommendation Report ******************************************************************************************************
INFO:
+---------------------------------------+------------------------+
|                 Metric                |         Value          |
+---------------------------------------+------------------------+
|             workload_type             |           ap           |
|         dirty_background_bytes        |           0            |
|             temp_file_size            |           0            |
|          current_locks_count          |          0.0           |
|      current_prepared_xacts_count     |          0.0           |
|         rollback_commit_ratio         |          0.0           |
|            write_tup_speed            |          0.0           |
|         average_connection_age        |        0.012993        |
|                 uptime                |   0.0202690916666667   |
|         fetched_returned_ratio        |  0.15769242371706166   |
| checkpoint_proactive_triggering_ratio |   0.203501094091904    |
|             cache_hit_rate            |   0.9780674962419049   |
|          current_connections          |          1.0           |
|            current_free_mem           |        15532000        |
|           all_database_size           |    1593746.21484375    |
|              os_mem_total             |        16430884        |
|        checkpoint_avg_sync_time       |    2.64145045326665    |
|             max_processes             |          3010          |
|          track_activity_size          |         3010.0         |
|                used_mem               |      3152148192.0      |
|              os_cpu_count             |           4            |
|                ap_index               |          6.7           |
|      shared_buffer_toast_hit_rate     |   63.41463414634146    |
|  checkpoint_dirty_writing_time_window |         720.0          |
|               block_size              |          8.0           |
|          search_modify_ratio          |       8616900.0        |
|            read_write_ratio           |       86169000.0       |
|      shared_buffer_tidx_hit_rate      |   91.66666666666667    |
|             read_tup_speed            |    917.793974728196    |
|       shared_buffer_idx_hit_rate      |   99.49449740875606    |
|      shared_buffer_heap_hit_rate      |   99.54367629484318    |
|                 is_hdd                |         False          |
|           enable_autovacuum           |          True          |
|                is_64bit               |          True          |
|              load_average             | [0.1125, 0.1625, 0.19] |
+---------------------------------------+------------------------+
p.s: The unit of storage is kB.
WARN:
[0]. The total size of all databases is less than the memory size. Therefore, it is unnecessary to set shared_buffers to a large value.
[1]. The number of CPU cores is a little small. Please do not run too high concurrency. You are recommended to set max_connections based on the number of CPU cores. If your job does not consume much CPU, you can also increase it.
BAD:
[0]. The value of wal_buffers is too high. Generally, a large value does not bring better performance.
****************************************************************************************************** Recommended Knob Settings ******************************************************************************************************
+---------------------------+-----------+--------+----------+---------+
|            name           | recommend |  min   |   max    | restart |
+---------------------------+-----------+--------+----------+---------+
|      random_page_cost     |    1.83   |  1.0   |   2.0    |  False  |
|  effective_io_concurrency |    155    |  150   |   250    |  False  |
|       shared_buffers      |   199218  | 199218 |  229101  |   True  |
|      max_connections      |    184    |   15   |   369    |   True  |
|    effective_cache_size   |  12323163 | 199218 | 12323163 |  False  |
|        wal_buffers        |    6225   |  2048  |   6225   |   True  |
| default_statistics_target |    1000   |  100   |   1000   |  False  |
+---------------------------+-----------+--------+----------+---------+

openGauss

开源数据库

openGauss社区官网

开源社区

MogDB AI 特性系列1_X-Tuner

MogDB AI 特性系列 1_X-Tuner ​

一、概述 ​

二、使用准备 ​

三、原理简介 ​

四、X-Tuner 安装 ​

1. 程序位置 ​

2. 安装依赖 ​

3. 安装 X-Tuner ​

五、文件解读 ​

六、使用示例 ​

1. 配置数据库连接信息 ​

2. recommend 模式 ​

3. train 模式 ​

4. tune 模式 ​