Skip to content

Conversation

@baoquan-he
Copy link
Collaborator

@baoquan-he baoquan-he commented Dec 1, 2025

Long before, to support multiple CPUs on x86_64, 'disable_cpu_apicid='
was introduced. It's to avoid the case when nr_cpus=xx is added, while
crashed cpu is not BSP cpu, then the crashed CPU will send INIT to BSP
cpu in kdump kernel. While the BSP cpu being reinitialized when
receiving INIT in the 2nd time will cause kdump kernel collapsing.

Now, in kernel commit 5c5682b9f87a ("x86/cpu: Detect real BSP on crash
kernels"), the requirement of disable_cpu_apicid has been taken off by
detecting real BSP on crashed kernel and not sending INIT to it.
Now testing passed on x86_64 system w/ or w/o disable_cpu_apicid and the
kernel message can be seen as below.

===
CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 1e != 0
CPU topo: Crash kernel detected. Disabling real BSP to prevent machine INIT

However, sometime older kernels which don't contain commit 5c5682b9f87a
are still loaded in newer OS. To keep good back compatibility, still
adding disable_cpu_apicid to capture kernel. It's not harmful on newer
kernels containing commit 5c5682b9f87a.

@daveyoung
Copy link
Contributor

It seems something more need to be removed:
[kdump-utils]$ git grep apic
kdump-lib.sh:# This function returns the "apicid" of the boot
kdump-lib.sh:get_bootcpu_apicid()
kdump-lib.sh: CPU=="0" && /^apicid/ { print $NF; }
kdump-lib.sh: id=$(get_bootcpu_apicid)
kdump-lib.sh: out+="disable_cpu_apicid=$id "
kexec-kdump-howto.txt:Make sure that you are using a kernel that supports disable_cpu_apicid
kexec-kdump-howto.txt:hardware issue (). The disable_cpu_apicid kernel option is automatically
kexec-kdump-howto.txt:(
) Without disable_cpu_apicid kernel option, capture kernel may lead to
spec/kdump-lib_spec.sh: get_bootcpu_apicid() {
spec/kdump-lib_spec.sh: add="disable_cpu_apicid=1 foo.pretimeout=0 rd.systemd.gpt_auto=no"

Otherwise, would the old kernel be a problem? Eg. use old kernel to capture a dump with set a different kernel version in /etc/sys/config/kdump KDUMP_KERNELVER=""

@baoquan-he
Copy link
Collaborator Author

It seems something more need to be removed: [kdump-utils]$ git grep apic kdump-lib.sh:# This function returns the "apicid" of the boot kdump-lib.sh:get_bootcpu_apicid() kdump-lib.sh: CPU=="0" && /^apicid/ { print $NF; } kdump-lib.sh: id=$(get_bootcpu_apicid) kdump-lib.sh: out+="disable_cpu_apicid=$id " kexec-kdump-howto.txt:Make sure that you are using a kernel that supports disable_cpu_apicid kexec-kdump-howto.txt:hardware issue (). The disable_cpu_apicid kernel option is automatically kexec-kdump-howto.txt:() Without disable_cpu_apicid kernel option, capture kernel may lead to spec/kdump-lib_spec.sh: get_bootcpu_apicid() { spec/kdump-lib_spec.sh: add="disable_cpu_apicid=1 foo.pretimeout=0 rd.systemd.gpt_auto=no"

Otherwise, would the old kernel be a problem? Eg. use old kernel to capture a dump with set a different kernel version in /etc/sys/config/kdump KDUMP_KERNELVER=""

Thanks, Dave. You are right, the v1 is not clean change. The left code in spec caused the failure in unit test. I removed get_bootcpu_apicid() and changed code in spec/kdump-lib_spec.sh, now the unit test failure is gone. While the integration tests are still failed, it seems not related to this patch.

@daveyoung
Copy link
Contributor

It seems something more need to be removed: [kdump-utils]$ git grep apic kdump-lib.sh:# This function returns the "apicid" of the boot kdump-lib.sh:get_bootcpu_apicid() kdump-lib.sh: CPU=="0" && /^apicid/ { print
Extra close brace or missing open brace

    $NF; } kdump-lib.sh: id=$(get_bootcpu_apicid) kdump-lib.sh: out+="disable_cpu_apicid=$id " kexec-kdump-howto.txt:Make sure that you are using a kernel that supports disable_cpu_apicid kexec-kdump-howto.txt:hardware issue (_). The disable_cpu_apicid kernel option is automatically kexec-kdump-howto.txt:(_) Without disable_cpu_apicid kernel option, capture kernel may lead to spec/kdump-lib_spec.sh: get_bootcpu_apicid() { spec/kdump-lib_spec.sh: add="disable_cpu_apicid=1 foo.pretimeout=0 rd.systemd.gpt_auto=no"

Otherwise, would the old kernel be a problem? Eg. use old kernel to capture a dump with set a different kernel version in /etc/sys/config/kdump KDUMP_KERNELVER=""

Thanks, Dave. You are right, the v1 is not clean change. The left code in spec caused the failure in unit test. I removed get_bootcpu_apicid() and changed code in spec/kdump-lib_spec.sh, now the unit test failure is gone. While the integration tests are still failed, it seems not related to this patch.

Hi @baoquan-he thanks for updating the patch. But maybe you missed the last question, can you have a look at the case of using old kernel for dump?

@baoquan-he
Copy link
Collaborator Author

Hi @baoquan-he thanks for updating the patch. But maybe you missed the last question, can you have a look at the case of using old kernel for dump?

Ah, you said 'otherwise', I thought it's related to the left code. Thomas's commit 5c5682b9f87a was merged in v6.8-rc4. That means rhel10 kernel all have the patch. In rhel9, it was back ported to rhel9.6. If so, I am wondering why we need to remove disable_cpu_apicid=, isn't keeping it in kdump-utils better back compatible with the old kernel, e.g loading rhel9.5 or older kernel on rhel10 OS?

@baoquan-he
Copy link
Collaborator Author

Or we can parse the older kernel's version and judge if we should add disable_cpu_apicid= to kdump kernel. Personally I think adding disable_cpu_apicid= to kdump kernel always seems a little better, it allows good back compatibility with older kernel, and won't cause issue. What do you think, @daveyoung ?

@daveyoung
Copy link
Contributor

Or we can parse the older kernel's version and judge if we should add disable_cpu_apicid= to kdump kernel. Personally I think adding disable_cpu_apicid= to kdump kernel always seems a little better, it allows good back compatibility with older kernel, and won't cause issue. What do you think, @daveyoung ?

Hi @baoquan-he I think it depends on your judgement and see how other people thinks, I just spotted potential issues when seeing the kernel commit and some apparent issues about the pr.

@baoquan-he
Copy link
Collaborator Author

Or we can parse the older kernel's version and judge if we should add disable_cpu_apicid= to kdump kernel. Personally I think adding disable_cpu_apicid= to kdump kernel always seems a little better, it allows good back compatibility with older kernel, and won't cause issue. What do you think, @daveyoung ?

Hi @baoquan-he I think it depends on your judgement and see how other people thinks, I just spotted potential issues when seeing the kernel commit and some apparent issues about the pr.

Thanks for the input. My first reaction was to remove it, now seems we should keep it. I will seek other people's input on this, to see if there's any different opinion.

@baoquan-he baoquan-he changed the title kdump-lib.sh: do not append 'disable_cpu_apicid=' to kdump kernel of x86 kexec-kdump-howto.txt: update paragraphs related to disable_cpu_apicid Dec 11, 2025
@coiby
Copy link
Member

coiby commented Dec 18, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the documentation in kexec-kdump-howto.txt regarding the disable_cpu_apicid kernel parameter. The changes correctly reflect that this parameter is no longer strictly necessary for newer kernels but is kept for backward compatibility. My review focuses on improving the clarity and grammar of the updated text to make the documentation easier to understand. I've provided a couple of suggestions to rephrase the new paragraphs for better readability.

Comment on lines 940 to 945
if the kernel doesn't support it. For newer kernel which contains commit
5c5682b9f87a ("x86/cpu: Detect real BSP on crash kernels"), the hardware
issue on x86 has been fixed in kernel. While adding disable_cpu_apicid is
not harmful for newer kernel. Here still adding it to capture kernel to
keep good back compatibility if an older kernel which doesn't contain commit
5c5682b9f87a is loaded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This paragraph has some grammatical errors and could be rephrased for better clarity. Using 'kernels' (plural), adding articles like 'the', and using more formal conjunctions like 'However' would improve readability.

if the kernel doesn't support it. For newer kernels that contain commit
5c5682b9f87a ("x86/cpu: Detect real BSP on crash kernels"), this hardware
issue on x86 has been fixed in the kernel. However, adding disable_cpu_apicid is
not harmful for newer kernels, and it is still added to the capture kernel to
maintain backward compatibility with older kernels that do not contain commit
5c5682b9f87a.

Comment on lines 955 to 957
(*) Without disable_cpu_apicid kernel option, capture kernel which doesn't
contain commit 5c5682b9f87a may lead to hang, system reset or power-off at boot,
depending on your system and runtime situation at the time of crash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This sentence is a bit awkward. It can be rephrased for better clarity and flow. Also, using a parallel structure for the list of potential problems ('may hang, reset, or power-off') would improve readability.

(*) Without the disable_cpu_apicid kernel option, a capture kernel that does not
contain commit 5c5682b9f87a may hang, reset, or power-off at boot,
depending on your system and runtime situation at the time of the crash.

@coiby
Copy link
Member

coiby commented Dec 18, 2025

Hi @baoquan-he,

The PR LGTM! And I think the two minor suggestions from Gemini can help improve readability, what do you think?

@baoquan-he
Copy link
Collaborator Author

I will update and resubmit, thanks for checking.

Long before, to support multiple CPUs on x86_64, 'disable_cpu_apicid='
was introduced. It's to avoid the case when nr_cpus=xx is added, while
crashed cpu is not BSP cpu, then the crashed CPU will send INIT to BSP
cpu in kdump kernel. While the BSP cpu being reinitialized when
receiving INIT in the 2nd time will cause kdump kernel collapsing.

Now, in kernel commit 5c5682b9f87a ("x86/cpu: Detect real BSP on crash
kernels"), the requirement of disable_cpu_apicid has been taken off by
detecting real BSP on crashed kernel and not sending INIT to it.
Now testing passed on x86_64 system w/ or w/o disable_cpu_apicid and the
kernel message can be seen as below.

===============
CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 1e != 0
CPU topo: Crash kernel detected. Disabling real BSP to prevent machine INIT
===============

However, sometime older kernels which don't contain commit 5c5682b9f87a
are still loaded in newer OS. To keep good back compatibility, still
adding disable_cpu_apicid to capture kernel. It's not harmful on newer
kernels containing commit 5c5682b9f87a.

Signed-off-by: Baoquan He <bhe@redhat.com>
@baoquan-he
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the documentation for kexec-kdump-howto.txt to reflect changes related to the disable_cpu_apicid kernel parameter. The update clarifies that this parameter is no longer required for newer kernels. While most of the change is accurate, I found a contradiction between the PR description and the updated documentation text regarding whether kdumpctl still appends this option. I've left a comment with a suggestion to align the documentation with the behavior described in the PR.

Comment on lines +938 to +945
The disable_cpu_apicid kernel option is automatically appended by kdumpctl
script to capture kernel to avoid x86 specific hardware issue and is ignored
if the kernel doesn't support it. For newer kernels that contain commit
5c5682b9f87a ("x86/cpu: Detect real BSP on crash kernels"), this hardware
issue on x86 has been fixed in the kernel. However, adding disable_cpu_apicid
is not harmful for newer kernels, and it is still added to the capture kernel
to maintain backward compatibility with older kernels that do not contain
commit 5c5682b9f87a.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There appears to be a contradiction between the PR description and this documentation update. The PR description states that disable_cpu_apicid= is no longer appended by kdumpctl, while this text claims it is still added for backward compatibility.

From the PR description:

Hence now do not append 'disable_cpu_apicid=' to kdump kernel on x86 since it's not needed any more.

From this change:

However, adding disable_cpu_apicid is not harmful for newer kernels, and it is still added to the capture kernel to maintain backward compatibility...

Assuming the PR description reflects the current behavior of kdumpctl, the documentation should be updated to state that the option is no longer added automatically. Could you please clarify this and adjust the text if needed?

Here's a suggested revision that aligns with the PR description:

The `disable_cpu_apicid` kernel option was previously appended automatically
by the `kdumpctl` script to work around an x86-specific hardware issue.
This is no longer necessary for kernels that contain commit 5c5682b9f87a
("x86/cpu: Detect real BSP on crash kernels"), as the issue is fixed
within the kernel. Therefore, `kdumpctl` no longer adds this option.
If you are using an older kernel without this fix, you may need to add
this option manually to `KDUMP_COMMANDLINE_APPEND` in `/etc/sysconfig/kdump`
if you encounter issues.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the cover letter, the AI suggested words are not correct.

@coiby coiby merged commit e22ca92 into rhkdump:main Dec 22, 2025
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants