Skip to content

Conversation

@hvj7
Copy link
Contributor

@hvj7 hvj7 commented Oct 30, 2025

mkdumprd has a code to add a disk to kdump initramfs, in case FIPS is enabled and /boot is on a separate partition. This code used to work, since dracut was not force checking that added disk is in fact available. Since dracut commit c79fc8f dracut in fact checks for added device, and since disk name could have been changed between live system and kdump initramfs, kdump can fail. To resolve this problem add disk by UUID, not by disk name.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @hvj7, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical issue in mkdumprd that caused kdump to fail under specific conditions: when FIPS is enabled and the /boot partition is separate. The problem arose because dracut now performs stricter checks on added devices, and device names can be inconsistent across system states. By implementing UUID-based device identification for the /boot partition, the change guarantees that dracut correctly locates and includes the necessary device, thereby preventing kdump failures and enhancing system stability in FIPS-compliant environments.

Highlights

  • Problem Resolution: Addresses kdump failures that occur when FIPS is enabled and /boot is on a separate partition. This issue stemmed from dracut's new strict device checking, which could fail if disk names changed between the live system and the kdump initramfs.
  • UUID-based Device Addition: Modifies mkdumprd to identify and add the /boot device to dracut using its Universally Unique Identifier (UUID) instead of its traditional device name. This ensures consistent and reliable device identification.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request modifies the mkdumprd script to add a disk to kdump initramfs by UUID instead of disk name when FIPS is enabled. This change addresses an issue where the disk name might change between the live system and kdump initramfs, causing kdump to fail. The script now retrieves the UUID of the /boot partition and uses it to add the disk to dracut arguments.

@hvj7
Copy link
Contributor Author

hvj7 commented Nov 3, 2025

@gemini-code-assist The failed format check test doesn't seem to be related to the change being submitted. Can someone please look into it?

@gemini-code-assist
Copy link
Contributor

Hello @hvj7, I understand your concern regarding the failed format check test. As an AI, I don't have the ability to investigate or resolve CI/CD pipeline issues directly. My function is to provide code reviews and summaries of pull requests.

It's possible that the test failure is indeed unrelated to your changes, or it might be a transient issue with the CI system. You might want to reach out to a repository maintainer or check the CI/CD configuration files (e.g., .github/workflows/main.yml or .cci.jenkinsfile) for more details on the format check and why it might be failing.

@coiby
Copy link
Member

coiby commented Nov 6, 2025

@gemini-code-assist The failed format check test doesn't seem to be related to the change being submitted. Can someone please look into it?

@hvj7 Yes, the failed format check has nothing to do with your code. Sorry for the confusion! Btw I've proposed a fix in PR #111

@gemini-code-assist
Copy link
Contributor

That's great news, @coiby! Thanks for confirming that the format check issue is unrelated to @hvj7's changes and for providing a fix in PR #111. This clarifies the situation for everyone involved.

@coiby
Copy link
Member

coiby commented Nov 10, 2025

Hi @hvj7,

Thanks for creating this PR!

If I understand the dracut commit c79fc8f correctly, it will make systemd wait infinitely for a device not specified by persistent names e.g. /dev/sda1. So I don't think it will cause a kdump failure. On the contrary, this dracut commit can address the failure when a block device needs more than 1min 30s to be activated. If I miss anything, please let me know. Thanks!

@aburmash
Copy link

@coiby the problem is that mkdumprd can add device as /dev/sda2 for example, but when crash is triggered and crash/kdump boot is started that exact device could already be /dev/sdb2, so there would be a loop waiting for non-existing device and timeout. I do not think kdump initramfs drops to rescue shell, i don't this so, but it fails to start correctly exactly because in that dracut commit now "indefinite" wait is enforce.
So using UUID allows to take in account that device name could be different.

@coiby
Copy link
Member

coiby commented Nov 11, 2025

@coiby the problem is that mkdumprd can add device as /dev/sda2 for example, but when crash is triggered and crash/kdump boot is started that exact device could already be /dev/sdb2, so there would be a loop waiting for non-existing device and timeout. I do not think kdump initramfs drops to rescue shell, i don't this so, but it fails to start correctly exactly because in that dracut commit now "indefinite" wait is enforce. So using UUID allows to take in account that device name could be different.

Thanks for the explanation! I had a deeper look at the dracut commit c79fc8f and did a few experiments. Combined with what you described, here's my understanding about the problem,

  1. mkdumprd specify the block device for FIPS by using dracut's --add-device option e.g. --add-device /dev/sda2
  2. With the dracut commit c79fc8f, dracut will now wait infinitely until /dev/sda2 is ready
  3. However, the block device used a different name so /dev/sda2 will never occur

Since the dracut commit c79fc8f allows users to use --add-device without specifying persistent name. I think we should fix this issue in dracut instead.

Btw, I'm yet to be familiar with udev to make the block device for /boot partition appear as a different name to reproduce the issue. I simply specify a non-existent block device by adding dracut_args --add-device /dev/sda3 to kdump.conf to mimic the problem. I'll appreciate it if you can share the steps or how to make a block device uses a different name.

Btw, the dracut commit message of c79fc8f is a bit misleading.
I find out before the commit, dracut simply doesn't wait for a block device to be ready unless wait_for_dev is called.

sh-5.2# systemctl show --property JobRunningTimeoutUSec /dev/sda1
JobRunningTimeoutUSec=1min 30s

And systemctl will even print JobRunningTimeoutUSec for non-existent-device,

sh-5.2# systemctl show --property JobRunningTimeoutUSec /dev/non-existent-device
JobRunningTimeoutUSec=45s

coiby added a commit to coiby/dracut-ng that referenced this pull request Nov 11, 2025
Currently, dracut will wait finitely for a block device e.g. /dev/sda2
when users use "dracut --add-device /dev/sda2".  However, /dev/sda2 is
not a persistant name and it can show as a different name and kdump can
fail [1] as dracut will wait forever for /dev/sda2.

To avoid this problem, only wait infinitely for a block device with
persistent name.

[1] rhkdump/kdump-utils#121

Reported-by: Alex Burmashev <alexander.burmashev@oracle.com>
Reported-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com>
Fixes: c79fc8f ("fix(dracut): rework timeout for devices added via --mount and --add-device")
Signed-off-by: Coiby Xu <coxu@redhat.com>
coiby added a commit to coiby/dracut-ng that referenced this pull request Nov 11, 2025
Currently, dracut will wait finitely for a block device e.g. /dev/sda2
when users use "dracut --add-device /dev/sda2".  However, /dev/sda2 is
not a persistant name and it can show as a different name and kdump can
fail [1] as dracut will wait forever for /dev/sda2.

To avoid this problem, only wait infinitely for a block device with
persistent name.

[1] rhkdump/kdump-utils#121

Reported-by: Alex Burmashev <alexander.burmashev@oracle.com>
Reported-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com>
Fixes: c79fc8f ("fix(dracut): rework timeout for devices added via --mount and --add-device")
Signed-off-by: Coiby Xu <coxu@redhat.com>
@coiby
Copy link
Member

coiby commented Nov 11, 2025

Hi @hvj7 and @aburmash I've proposed a fix in dracut. So unless dracut developers reject it, I think it's better to fix in dracut.

coiby added a commit to coiby/dracut-ng that referenced this pull request Nov 11, 2025
Currently, dracut will wait finitely for a block device e.g. /dev/sda2
when users use "dracut --add-device /dev/sda2". However, /dev/sda2 is
not a persistant name and it can show as a different name and kdump can
fail [1] as dracut will wait forever for /dev/sda2.

Commit c79fc8f ("fix(dracut): rework timeout for devices added
via --mount and --add-device") already sets infinite timeout for
the underlying persistent device. There is no need to also set infinite
timeout for non-persistent device name. Removing the redundant wait can
automatically resolve the case where a device name changes.

[1] rhkdump/kdump-utils#121

Reported-by: Alex Burmashev <alexander.burmashev@oracle.com>
Reported-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com>
Fixes: c79fc8f ("fix(dracut): rework timeout for devices added via --mount and --add-device")
Signed-off-by: Coiby Xu <coxu@redhat.com>
coiby added a commit to coiby/dracut-ng that referenced this pull request Nov 12, 2025
Currently, dracut will wait finitely for a block device e.g. /dev/sda2
when users use "dracut --add-device /dev/sda2". However, /dev/sda2 is
not a persistant name and it can show as a different name and kdump can
fail [1] as dracut will wait forever for /dev/sda2.

Commit c79fc8f ("fix(dracut): rework timeout for devices added
via --mount and --add-device") already sets infinite timeout for
the underlying persistent device. There is no need to also set infinite
timeout for non-persistent device name. Removing the redundant wait can
automatically resolve the case where a device name changes.

[1] rhkdump/kdump-utils#121

Reported-by: Alex Burmashev <alexander.burmashev@oracle.com>
Reported-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com>
Fixes: c79fc8f ("fix(dracut): rework timeout for devices added via --mount and --add-device")
Signed-off-by: Coiby Xu <coxu@redhat.com>
@coiby
Copy link
Member

coiby commented Nov 26, 2025

Hi @hvj7 and @aburmash,

Finally I understand what problem the dracut commit c79fc8f tries to resolve. Before that commit, with --add-device /dev/mapper/vg-lvol0, dev-vg-lvol0.device can still time out despite that /dev/mapper/vg-lvol0 is already a persistent name. Because dracut would wait forever for a different persist name e.g. /dev/disk/by-uuid/69f27553-5f60-41e8-94a7-51e6b8da79fc instead. I may send a PR to update the dracut's doc.

Meanwhile, I think there is a necessity to make a change in kdump-utils/mkdrump. Can you update the PR to re-use get_persistent_dev from dracut-functions.sh which will be sourced by mkdrumpd to get persistent device name instead?

@hvj7 hvj7 force-pushed the uuidfixfips branch 4 times, most recently from 404b6f5 to 021e17a Compare November 26, 2025 11:53
@hvj7
Copy link
Contributor Author

hvj7 commented Nov 26, 2025

Hi @coiby,
Is this fine? I see that dracut-functions.sh is already sourced in line 18 so I just used the function. Earlier I had included a check to see whether this file exists or not. Please let me know if this is fine or the script is supposed to be sourced differently.

@coiby
Copy link
Member

coiby commented Nov 27, 2025

Hi @coiby, Is this fine? I see that dracut-functions.sh is already sourced in line 18 so I just used the function. Earlier I had included a check to see whether this file exists or not. Please let me know if this is fine or the script is supposed to be sourced differently.

Thanks for updating the PR! Yes, the change LGTM. We can assume mkdumprd can always source dracut-functions.sh successfully. Btw, the commit message needs some rephrasing accordingly.

/packit build

@hvj7 hvj7 force-pushed the uuidfixfips branch 2 times, most recently from ed952ac to 404f95a Compare November 27, 2025 12:15
@hvj7
Copy link
Contributor Author

hvj7 commented Nov 27, 2025

Please let me know if this commit message is fine @coiby

@coiby
Copy link
Member

coiby commented Nov 27, 2025

Please let me know if this commit message is fine @coiby

Thanks for updating the commit message! Once the subject line gets updated as well, we are good to go:)

/packit build

@hvj7
Copy link
Contributor Author

hvj7 commented Nov 27, 2025

Please let me know if this commit message is fine @coiby

Thanks for updating the commit message! Once the subject line gets updated as well, we are good to go:)

Ah snap forgot about the subject line. Let me know if it's okay now.

@coiby
Copy link
Member

coiby commented Nov 28, 2025

Please let me know if this commit message is fine @coiby

Thanks for updating the commit message! Once the subject line gets updated as well, we are good to go:)

Ah snap forgot about the subject line. Let me know if it's okay now.

Thanks! I notice the static-analysis test fails because a fi gets removed by mistake. Sorry the test failed to run automatically seemingly caused a permission issue.

@hvj7
Copy link
Contributor Author

hvj7 commented Dec 1, 2025

Hi @coiby are there any further steps required by me or this will automatically be merged eventually?

@hvj7 hvj7 changed the title Add device by UUID if FIPS is enabled Add persisent device if FIPS is enabled Dec 1, 2025
@coiby
Copy link
Member

coiby commented Dec 2, 2025

Hi @coiby are there any further steps required by me or this will automatically be merged eventually?

Hi @hvj7 the error about removing if as caught by the static-analysis test is yet to be mixed. And a nitpick is the commit message should be ended with period.

/packit build

mkdumprd has a code to add a disk to kdump initramfs, in case FIPS is
enabled and /boot is on a separate partition. This code used to work,
since dracut was not force checking that added disk is in fact
available. Since dracut commit c79fc8f dracut in fact checks for added
device, and since disk name could have been changed between live system
and kdump initramfs, kdump can fail.
To resolve this problem we re-use get_persistent_dev from
dracut-functions.sh which will be sourced by mkdrumpd to get persistent
device name.

Signed-off-by: Alex Burmashev <alexander.burmashev@oracle.com>
Signed-off-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com>
@hvj7
Copy link
Contributor Author

hvj7 commented Dec 5, 2025

Some tests are still failing unfortunately

@coiby
Copy link
Member

coiby commented Dec 5, 2025

Some tests are still failing unfortunately

Don't worry. The other test failures have nothing to do with this PR and will be fixed later.

The patch LGTM now! I'll merge the PR. Thanks for your contribution!

@coiby coiby merged commit 2de96da into rhkdump:main Dec 5, 2025
4 of 9 checks passed
@hvj7 hvj7 deleted the uuidfixfips branch December 6, 2025 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants