-
Notifications
You must be signed in to change notification settings - Fork 1.2k
scheduler: network slow store scheduler enhancement #21196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -1158,7 +1158,7 @@ pd-ctl resource-manager config controller set ltb-max-wait-duration 30m | |||||
| >> scheduler config evict-leader-scheduler // v4.0.0 起,展示该调度器具体在哪些 store 上 | ||||||
| >> scheduler config evict-leader-scheduler add-store 2 // 为 store 2 添加 leader 驱逐调度 | ||||||
| >> scheduler config evict-leader-scheduler delete-store 2 // 为 store 2 移除 leader 驱逐调度 | ||||||
| >> scheduler add evict-slow-store-scheduler // 当有且仅有一个 slow store 时将该 store 上的所有 Region 的 leader 驱逐出去 | ||||||
| >> scheduler add evict-slow-store-scheduler // 自动检测磁盘或网络慢节点,并在满足条件时将该 store 上的所有 Region leader 驱逐出去 | ||||||
| >> scheduler remove grant-leader-scheduler-1 // 把对应的调度器删掉,`-1` 对应 store ID | ||||||
| >> scheduler pause balance-region-scheduler 10 // 暂停运行 balance-region 调度器 10 秒 | ||||||
| >> scheduler pause all 10 // 暂停运行所有的调度器 10 秒 | ||||||
|
|
@@ -1182,6 +1182,25 @@ pd-ctl resource-manager config controller set ltb-max-wait-duration 30m | |||||
| - `pending`:表示当前调度器无法产生调度。`pending` 状态的调度器,会返回一个概览信息,来帮助用户诊断。概览信息包含了 store 的一些状态信息,解释了它们为什么不能被选中进行调度。 | ||||||
| - `normal`:表示当前调度器无需进行调度。 | ||||||
|
|
||||||
| ### `scheduler config evict-slow-store-scheduler` | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这个不是新引入的,只有里面的 enable-network-slow-store 是新引入的 |
||||||
|
|
||||||
| `evict-slow-store-scheduler` 用于在 TiKV 节点出现磁盘 I/O 或网络抖动时,阻断 PD 向异常节点调度 leader,并在必要时主动驱逐 leader。TiKV 会在 store 心跳中同时上报 `SlowScore`(磁盘)与 `NetworkSlowScore`(网络),分值范围均为 1~100,数值越大代表该节点越可能异常。 | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 原文对
Suggested change
|
||||||
|
|
||||||
| 你可以通过 `recovery-duration` 来控制慢节点恢复正常的时间。 | ||||||
|
|
||||||
| 示例: | ||||||
|
|
||||||
| ```bash | ||||||
| >> scheduler config evict-slow-store-scheduler // 查看当前配置 | ||||||
| { | ||||||
| "recovery-duration": "1800" // 30 分钟 | ||||||
| } | ||||||
| >> scheduler config evict-slow-store-scheduler set recovery-duration 600 | ||||||
| ``` | ||||||
|
|
||||||
| 要启用网络慢节点探测,你需要同时进行以下配置:首先,通过 `scheduler config evict-slow-store-scheduler set enable-network-slow-store true` 在 PD 侧开启调度器对网络慢节点的处理;其次,确保 TiKV 侧的 [`raftstore.inspect-network-interval`](/tikv-configuration-file.md#inspect-network-interval) 配置项大于 `0` 以启用网络探测。若要直接关闭 TiKV 内部的网络探测,可将该配置项设置为 `0`。 | ||||||
| 当你需要直接关闭 tikv 内部的网络探测时,可将 TiKV 侧 [`raftstore.inspect-network-interval`](/tikv-configuration-file.md#inspect-network-interval) 设置为 `0`。 | ||||||
|
|
||||||
| ### `scheduler config balance-leader-scheduler` | ||||||
|
|
||||||
| 用于查看和控制 `balance-leader-scheduler` 策略。 | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
优化后的描述更准确地说明了调度器的功能,但‘磁盘或网络慢节点’的表述可以更具体,以明确其检测的是磁盘或网络的慢节点。同时,建议将‘满足条件时’具体化,以提升文档的清晰度。