-
Notifications
You must be signed in to change notification settings - Fork 21
kexec-kdump-howto.txt: update paragraphs related to disable_cpu_apicid #129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
971176f to
7ed55e5
Compare
|
It seems something more need to be removed: Otherwise, would the old kernel be a problem? Eg. use old kernel to capture a dump with set a different kernel version in /etc/sys/config/kdump KDUMP_KERNELVER="" |
7ed55e5 to
d10c3ec
Compare
Thanks, Dave. You are right, the v1 is not clean change. The left code in spec caused the failure in unit test. I removed get_bootcpu_apicid() and changed code in spec/kdump-lib_spec.sh, now the unit test failure is gone. While the integration tests are still failed, it seems not related to this patch. |
Hi @baoquan-he thanks for updating the patch. But maybe you missed the last question, can you have a look at the case of using old kernel for dump? |
Ah, you said 'otherwise', I thought it's related to the left code. Thomas's commit 5c5682b9f87a was merged in v6.8-rc4. That means rhel10 kernel all have the patch. In rhel9, it was back ported to rhel9.6. If so, I am wondering why we need to remove disable_cpu_apicid=, isn't keeping it in kdump-utils better back compatible with the old kernel, e.g loading rhel9.5 or older kernel on rhel10 OS? |
|
Or we can parse the older kernel's version and judge if we should add disable_cpu_apicid= to kdump kernel. Personally I think adding disable_cpu_apicid= to kdump kernel always seems a little better, it allows good back compatibility with older kernel, and won't cause issue. What do you think, @daveyoung ? |
Hi @baoquan-he I think it depends on your judgement and see how other people thinks, I just spotted potential issues when seeing the kernel commit and some apparent issues about the pr. |
Thanks for the input. My first reaction was to remove it, now seems we should keep it. I will seek other people's input on this, to see if there's any different opinion. |
d10c3ec to
3b9ab5d
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the documentation in kexec-kdump-howto.txt regarding the disable_cpu_apicid kernel parameter. The changes correctly reflect that this parameter is no longer strictly necessary for newer kernels but is kept for backward compatibility. My review focuses on improving the clarity and grammar of the updated text to make the documentation easier to understand. I've provided a couple of suggestions to rephrase the new paragraphs for better readability.
kexec-kdump-howto.txt
Outdated
| if the kernel doesn't support it. For newer kernel which contains commit | ||
| 5c5682b9f87a ("x86/cpu: Detect real BSP on crash kernels"), the hardware | ||
| issue on x86 has been fixed in kernel. While adding disable_cpu_apicid is | ||
| not harmful for newer kernel. Here still adding it to capture kernel to | ||
| keep good back compatibility if an older kernel which doesn't contain commit | ||
| 5c5682b9f87a is loaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph has some grammatical errors and could be rephrased for better clarity. Using 'kernels' (plural), adding articles like 'the', and using more formal conjunctions like 'However' would improve readability.
if the kernel doesn't support it. For newer kernels that contain commit
5c5682b9f87a ("x86/cpu: Detect real BSP on crash kernels"), this hardware
issue on x86 has been fixed in the kernel. However, adding disable_cpu_apicid is
not harmful for newer kernels, and it is still added to the capture kernel to
maintain backward compatibility with older kernels that do not contain commit
5c5682b9f87a.
kexec-kdump-howto.txt
Outdated
| (*) Without disable_cpu_apicid kernel option, capture kernel which doesn't | ||
| contain commit 5c5682b9f87a may lead to hang, system reset or power-off at boot, | ||
| depending on your system and runtime situation at the time of crash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is a bit awkward. It can be rephrased for better clarity and flow. Also, using a parallel structure for the list of potential problems ('may hang, reset, or power-off') would improve readability.
(*) Without the disable_cpu_apicid kernel option, a capture kernel that does not
contain commit 5c5682b9f87a may hang, reset, or power-off at boot,
depending on your system and runtime situation at the time of the crash.
|
Hi @baoquan-he, The PR LGTM! And I think the two minor suggestions from Gemini can help improve readability, what do you think? |
|
I will update and resubmit, thanks for checking. |
Long before, to support multiple CPUs on x86_64, 'disable_cpu_apicid='
was introduced. It's to avoid the case when nr_cpus=xx is added, while
crashed cpu is not BSP cpu, then the crashed CPU will send INIT to BSP
cpu in kdump kernel. While the BSP cpu being reinitialized when
receiving INIT in the 2nd time will cause kdump kernel collapsing.
Now, in kernel commit 5c5682b9f87a ("x86/cpu: Detect real BSP on crash
kernels"), the requirement of disable_cpu_apicid has been taken off by
detecting real BSP on crashed kernel and not sending INIT to it.
Now testing passed on x86_64 system w/ or w/o disable_cpu_apicid and the
kernel message can be seen as below.
===============
CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 1e != 0
CPU topo: Crash kernel detected. Disabling real BSP to prevent machine INIT
===============
However, sometime older kernels which don't contain commit 5c5682b9f87a
are still loaded in newer OS. To keep good back compatibility, still
adding disable_cpu_apicid to capture kernel. It's not harmful on newer
kernels containing commit 5c5682b9f87a.
Signed-off-by: Baoquan He <bhe@redhat.com>
3b9ab5d to
8f57989
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the documentation for kexec-kdump-howto.txt to reflect changes related to the disable_cpu_apicid kernel parameter. The update clarifies that this parameter is no longer required for newer kernels. While most of the change is accurate, I found a contradiction between the PR description and the updated documentation text regarding whether kdumpctl still appends this option. I've left a comment with a suggestion to align the documentation with the behavior described in the PR.
| The disable_cpu_apicid kernel option is automatically appended by kdumpctl | ||
| script to capture kernel to avoid x86 specific hardware issue and is ignored | ||
| if the kernel doesn't support it. For newer kernels that contain commit | ||
| 5c5682b9f87a ("x86/cpu: Detect real BSP on crash kernels"), this hardware | ||
| issue on x86 has been fixed in the kernel. However, adding disable_cpu_apicid | ||
| is not harmful for newer kernels, and it is still added to the capture kernel | ||
| to maintain backward compatibility with older kernels that do not contain | ||
| commit 5c5682b9f87a. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a contradiction between the PR description and this documentation update. The PR description states that disable_cpu_apicid= is no longer appended by kdumpctl, while this text claims it is still added for backward compatibility.
From the PR description:
Hence now do not append 'disable_cpu_apicid=' to kdump kernel on x86 since it's not needed any more.
From this change:
However, adding disable_cpu_apicid is not harmful for newer kernels, and it is still added to the capture kernel to maintain backward compatibility...
Assuming the PR description reflects the current behavior of kdumpctl, the documentation should be updated to state that the option is no longer added automatically. Could you please clarify this and adjust the text if needed?
Here's a suggested revision that aligns with the PR description:
The `disable_cpu_apicid` kernel option was previously appended automatically
by the `kdumpctl` script to work around an x86-specific hardware issue.
This is no longer necessary for kernels that contain commit 5c5682b9f87a
("x86/cpu: Detect real BSP on crash kernels"), as the issue is fixed
within the kernel. Therefore, `kdumpctl` no longer adds this option.
If you are using an older kernel without this fix, you may need to add
this option manually to `KDUMP_COMMANDLINE_APPEND` in `/etc/sysconfig/kdump`
if you encounter issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the cover letter, the AI suggested words are not correct.
Long before, to support multiple CPUs on x86_64, 'disable_cpu_apicid='
was introduced. It's to avoid the case when nr_cpus=xx is added, while
crashed cpu is not BSP cpu, then the crashed CPU will send INIT to BSP
cpu in kdump kernel. While the BSP cpu being reinitialized when
receiving INIT in the 2nd time will cause kdump kernel collapsing.
Now, in kernel commit 5c5682b9f87a ("x86/cpu: Detect real BSP on crash
kernels"), the requirement of disable_cpu_apicid has been taken off by
detecting real BSP on crashed kernel and not sending INIT to it.
Now testing passed on x86_64 system w/ or w/o disable_cpu_apicid and the
kernel message can be seen as below.
===
CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 1e != 0
CPU topo: Crash kernel detected. Disabling real BSP to prevent machine INIT
However, sometime older kernels which don't contain commit 5c5682b9f87a
are still loaded in newer OS. To keep good back compatibility, still
adding disable_cpu_apicid to capture kernel. It's not harmful on newer
kernels containing commit 5c5682b9f87a.