Skip to content

Conversation

@thomas0903
Copy link
Collaborator

This base docker file makes the program work inside of docker. Might need to change some things in there depending on the host system. Please note that vmlinux.h should be created by the host system (for now at least) because docker shares the kernel with the host (I think).

To build run:
docker build -t gpuprobe:latest .

To run:
docker run --rm -it --privileged --cap-add=SYS_ADMIN --gpus all -p 9000:9000 gpuprobe:latest

This will start the memleak checker.

Copy link
Contributor

@ethangraham2001 ethangraham2001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but a few concerns to address before merging.

Also, I suggest changing the base of the PR. This PR contains commits from your other PR (the one for cross-platform compatibility). These commits shouldn't be present in a PR that is unrelated.

I think we should also discuss whether we want the docker files within this repo or another one dedicated to docker files and environment setup. It's probably worth separating code and deployment moving forward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. I have a few questions relating to this file.

  • Firstly, we are able to run the daemon within the docker container, which is fantastic. What does the workflow look like so that a user can then run their code inside of the container as well?
  • Have you confirmed that prometheus metrics can be accessed from outside of the docker container through port 9000? I'm not sure if docker works over localhost in Linux or if it has special internal addresses. Need to check
  • Have you checked how we can build this into a container image? That would be good for releases or other, but not necessarily top-priority right now.

EXPOSE 9000

# Run the GPU probe binary with some default arguments
CMD ["/usr/local/bin/gpu_probe", "--memleak", "--metrics-addr", "0.0.0.0:9000"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flags are hard-coded. Is there any way to allow the user to configure this on the fly?

Not necessarily a problem if we are sharing pre-rolled docker files to make deployment easier with recommended configs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be any rust files in this PR. I suggest making a new branch from main, adding the docker file, and then only adding that file to the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants