Skip to content

Conversation

@mdboom
Copy link
Contributor

@mdboom mdboom commented Jan 20, 2026

The cuda.bindings.nvml tests were written with the assumption that they would be run by a non-root user (as is the case in our CI). SWQA seemingly is running some tests as root, which then causes the test to fail. This change makes the test robust to either situation.

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Jan 20, 2026

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@mdboom mdboom added cuda.bindings Everything related to the cuda.bindings module test Improvements or additions to tests labels Jan 20, 2026
@mdboom mdboom self-assigned this Jan 20, 2026
@mdboom
Copy link
Contributor Author

mdboom commented Jan 20, 2026

/ok to test

@github-actions
Copy link

@mdboom
Copy link
Contributor Author

mdboom commented Jan 20, 2026

/ok to test

1 similar comment
@mdboom
Copy link
Contributor Author

mdboom commented Jan 20, 2026

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Jan 21, 2026

This is proving to be a challenge. On some of our CI runners, all signs indicate that the user is the root user (uid == 0, a member of gid == 0) etc., but the API call still fails as NoPermissionError. I suspect this is because the user is root inside a container, and the actual thing (device handle, maybe) that this API is trying to access requires root outside of the container.

We know, however, that SWQA has able to run as root and get this API call to pass (which then caused the original test to fail).

I'm thinking I'll just rewrite the test so that it will be robust to the call failing or passing regardless of the user's status, but that's a workaround.

Anyone else run into this issue and have a better solution?

@kkraus14
Copy link
Collaborator

@mdboom since nvml does some pretty low level system interactions, things can be somewhat arbitrarily blocked off via things like seccomp policies even if we can detect we're in a container or anything else reasonable.

I think your approach of treating getting nvml.NoPermissionError as either a skip or a success sounds reasonable to me. I don't know if it makes sense for us to chase marking tests for it versus just having a blanket policy.

@mdboom
Copy link
Contributor Author

mdboom commented Jan 22, 2026

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.bindings Everything related to the cuda.bindings module test Improvements or additions to tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants