-
Notifications
You must be signed in to change notification settings - Fork 1.2k
scheduler: network slow store scheduler enhancement #21196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@okJiang: GitHub didn't allow me to request PR reviews from the following users: LykxSassinator. Note that only pingcap members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@rleungx: adding LGTM is restricted to approvers and reviewers in OWNERS files. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
tikv-configuration-file.md
Outdated
| + 默认值:100ms | ||
| + 最小值:1ms | ||
|
|
||
| ### `inspect-network-interval` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### `inspect-network-interval` | |
| ### `inspect-network-interval` <span class="version-mark">从 v8.5.5 和 v9.0.0 版本开始引入</span> |
|
/bot-review |
| >> scheduler config evict-leader-scheduler add-store 2 // 为 store 2 添加 leader 驱逐调度 | ||
| >> scheduler config evict-leader-scheduler delete-store 2 // 为 store 2 移除 leader 驱逐调度 | ||
| >> scheduler add evict-slow-store-scheduler // 当有且仅有一个 slow store 时将该 store 上的所有 Region 的 leader 驱逐出去 | ||
| >> scheduler add evict-slow-store-scheduler // 自动检测磁盘或网络慢节点,并在满足条件时将该 store 上的所有 Region leader 驱逐出去 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
优化后的描述更准确地说明了调度器的功能,但‘磁盘或网络慢节点’的表述可以更具体,以明确其检测的是磁盘或网络的慢节点。同时,建议将‘满足条件时’具体化,以提升文档的清晰度。
| >> scheduler add evict-slow-store-scheduler // 自动检测磁盘或网络慢节点,并在满足条件时将该 store 上的所有 Region leader 驱逐出去 | |
| >> scheduler add evict-slow-store-scheduler // 自动检测磁盘慢节点或网络慢节点,并在检测到慢节点时将该 store 上的所有 Region leader 驱逐出去 |
|
|
||
| ### `scheduler config evict-slow-store-scheduler` | ||
|
|
||
| `evict-slow-store-scheduler` 用于在 TiKV 节点出现磁盘 I/O 或网络抖动时,阻断 PD 向异常节点调度 leader,并在必要时主动驱逐 leader。TiKV 会在 store 心跳中同时上报 `SlowScore`(磁盘)与 `NetworkSlowScore`(网络),分值范围均为 1~100,数值越大代表该节点越可能异常。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
原文对 SlowScore 和 NetworkSlowScore 的解释不够清晰,容易让用户混淆。建议明确说明这两个分数分别由磁盘和网络探测产生,并说明 PD 如何根据它们做出决策。
| `evict-slow-store-scheduler` 用于在 TiKV 节点出现磁盘 I/O 或网络抖动时,阻断 PD 向异常节点调度 leader,并在必要时主动驱逐 leader。TiKV 会在 store 心跳中同时上报 `SlowScore`(磁盘)与 `NetworkSlowScore`(网络),分值范围均为 1~100,数值越大代表该节点越可能异常。 | |
| `evict-slow-store-scheduler` 用于在 TiKV 节点出现磁盘 I/O 或网络抖动时,阻断 PD 向异常节点调度 leader,并在必要时主动驱逐 leader。TiKV 会在 store 心跳中同时上报 `SlowScore`(磁盘 I/O 探测产生)与 `NetworkSlowScore`(网络探测产生),分值范围均为 1~100,数值越大代表该节点越可能异常。PD 会综合这两个分数来判断节点是否为慢节点。 |
tikv-configuration-file.md
Outdated
|
|
||
| ### `inspect-network-interval` | ||
|
|
||
| + 控制 TiKV HealthChecker 主动向 PD 以及其他 TiKV 节点发起网络探测的周期,用于计算 `NetworkSlowScore` 并向 PD 上报慢节点的网络状态。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
该句描述不够清晰,建议明确说明网络探测的目的和NetworkSlowScore的作用,使逻辑更连贯。
| + 控制 TiKV HealthChecker 主动向 PD 以及其他 TiKV 节点发起网络探测的周期,用于计算 `NetworkSlowScore` 并向 PD 上报慢节点的网络状态。 | |
| + 控制 TiKV HealthChecker 主动向 PD 以及其他 TiKV 节点发起网络探测的周期。探测结果用于计算 `NetworkSlowScore`,该分数将上报给 PD 以反映慢节点的网络状态。 |
tikv-configuration-file.md
Outdated
| ### `inspect-network-interval` | ||
|
|
||
| + 控制 TiKV HealthChecker 主动向 PD 以及其他 TiKV 节点发起网络探测的周期,用于计算 `NetworkSlowScore` 并向 PD 上报慢节点的网络状态。 | ||
| + 设置为 `0` 表示关闭网络探测。数值越小,采样频率越高,能够更快放大网络抖动,但也会消耗更多网络与 CPU 资源。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
原文‘放大网络抖动’表述不够准确,且‘数值越小’与‘采样频率越高’的逻辑关系可以更直接地表达。
| + 设置为 `0` 表示关闭网络探测。数值越小,采样频率越高,能够更快放大网络抖动,但也会消耗更多网络与 CPU 资源。 | |
| + 设置为 `0` 表示关闭网络探测。数值越小,探测频率越高,能更灵敏地检测到网络延迟,但也会消耗更多网络与 CPU 资源。 |
|
✅ AI review completed, 7 comments generated. |
Signed-off-by: okjiang <819421878@qq.com>
| - `pending`:表示当前调度器无法产生调度。`pending` 状态的调度器,会返回一个概览信息,来帮助用户诊断。概览信息包含了 store 的一些状态信息,解释了它们为什么不能被选中进行调度。 | ||
| - `normal`:表示当前调度器无需进行调度。 | ||
|
|
||
| ### `scheduler config evict-slow-store-scheduler` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### `scheduler config evict-slow-store-scheduler` | |
| ### `scheduler config evict-slow-store-scheduler` <span class="version-mark">从 v8.5.5 和 v9.0.0 版本开始引入</span> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个不是新引入的,只有里面的 enable-network-slow-store 是新引入的
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
/cc @LykxSassinator |
|
@okJiang: GitHub didn't allow me to request PR reviews from the following users: LykxSassinator. Note that only pingcap members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
LykxSassinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest LGTM
Co-authored-by: lucasliang <nkcs_lykx@hotmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@okJiang: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
First-time contributors' checklist
What is changed, added or deleted? (Required)
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions (in Chinese).
What is the related PR or file link(s)?
Do your changes match any of the following descriptions?