Skip to content

Conversation

@yoursanonymous
Copy link

  • Please check if the PR fulfills these requirements
  • The commit message follows our guidelines
  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

/kind documentation

  • What this PR does / why we need it:

This pull request introduces a new, comprehensive tutorial series focused on real-world production scenarios. It addresses the gap between basic examples and the complex end-to-end configurations required for production environments.

  • Which issue(s) this PR fixes:

@volcano-sh-bot volcano-sh-bot added the kind/documentation Categorizes issue or PR as related to documentation. label Feb 11, 2026
@volcano-sh-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign thor-wl for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yoursanonymous yoursanonymous changed the title docs: add humanized tutorial series docs: add tutorial series Feb 11, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello @yoursanonymous, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's documentation by introducing a new, comprehensive tutorial series. These tutorials are designed to guide users through real-world production scenarios for batch workloads on Kubernetes using Volcano, bridging the gap between basic examples and complex end-to-end configurations. The series covers critical aspects like distributed machine learning, big data processing, GPU optimization, multi-tenancy, and workflow orchestration.

Highlights

  • New Tutorial Series Index: Introduced a new top-level index page for the Volcano Tutorial Series, providing an overview of the series' purpose and structure.
  • Distributed TensorFlow Tutorial: Added a comprehensive tutorial demonstrating how to run distributed TensorFlow training jobs using Volcano's gang scheduling and service discovery features.
  • Apache Spark Tutorial: Included a tutorial on optimizing Apache Spark batch processing on Kubernetes with Volcano, focusing on preventing resource starvation through gang scheduling.
  • GPU Resource Management Tutorial: Provided a tutorial on efficient GPU scheduling and resource management using Volcano, covering fractional GPU sharing (vGPU) and hardware isolation (MIG).
  • Multi-tenancy Configuration Tutorial: Added a tutorial explaining how to set up multi-tenant environments with Volcano's hierarchical queues for fair resource sharing and dynamic allocation.
  • Argo Workflows Integration Tutorial: Created a tutorial detailing the integration of Volcano as a scheduler for Argo Workflows, enabling advanced batch scheduling capabilities for CI/CD and data pipelines.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • content/en/docs/tutorials/_index.md
    • Added new tutorial series index page.
  • content/en/docs/tutorials/argo-workflows.md
    • Added new tutorial for integrating Volcano with Argo Workflows.
  • content/en/docs/tutorials/gpu-scheduling.md
    • Added new tutorial for GPU scheduling and resource management.
  • content/en/docs/tutorials/multi-tenancy.md
    • Added new tutorial for multi-tenancy queue configuration.
  • content/en/docs/tutorials/spark.md
    • Added new tutorial for batch processing with Apache Spark.
  • content/en/docs/tutorials/tensorflow.md
    • Added new tutorial for running distributed TensorFlow training.
Activity
  • No specific activity (comments, reviews) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@volcano-sh-bot volcano-sh-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Feb 11, 2026
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable and comprehensive tutorial series covering several real-world production scenarios with Volcano. The documentation is well-structured and addresses important use cases. I've identified a few issues, including a critical error in one of the YAML manifests that would prevent it from working, an incorrect field in another example, and a broken link on the main tutorial page. My review includes specific suggestions to correct these points. With these fixes, this will be an excellent addition to the project's documentation.

Comment on lines +48 to +66
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: gpu-sharing-tutorial
spec:
minAvailable: 1
schedulerName: volcano
template:
spec:
containers:
- name: gpu-container
image: nvidia/cuda:11.0-base
command: ["sh", "-c", "nvidia-smi && sleep 3600"]
resources:
limits:
volcano.sh/vgpu-number: 1 # Request 1 virtual GPU
volcano.sh/vgpu-memory: 2000 # Limit to 2000MiB of GPU memory
volcano.sh/vgpu-cores: 20 # Limit to 20% of GPU compute
restartPolicy: Never

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The provided Volcano Job manifest is invalid. The spec.template field is not a valid field for a batch.volcano.sh/v1alpha1 Job. You should use spec.tasks, which is a list of tasks, where each task contains a pod template. The current manifest will fail to apply. I've suggested a corrected manifest that wraps the pod template inside a task.

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: gpu-sharing-tutorial
spec:
  minAvailable: 1
  schedulerName: volcano
  tasks:
    - replicas: 1
      name: gpu-task
      template:
        spec:
          containers:
            - name: gpu-container
              image: nvidia/cuda:11.0-base
              command: ["sh", "-c", "nvidia-smi && sleep 3600"]
              resources:
                limits:
                  volcano.sh/vgpu-number: 1    # Request 1 virtual GPU
                  volcano.sh/vgpu-memory: 2000 # Limit to 2000MiB of GPU memory
                  volcano.sh/vgpu-cores: 20    # Limit to 20% of GPU compute
          restartPolicy: Never

---

> [!TIP]
> Each tutorial is designed to be self-contained. However, if you are new to Volcano, we recommend starting with the [Quick Start: Deploy a Volcano Job](https://volcano.sh/en/docs/tutorials/#quick-start-deploy-a-volcano-job) guide.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The link provided in the tip points to an anchor (#quick-start-deploy-a-volcano-job) on the current page, but this anchor does not exist. This will result in a broken link for users. Please update the URL to point to the correct Quick Start guide, or add the relevant content to this page.

- name: volcano-job-step
resource:
action: create
successCondition: status.state == Completed # Wait for the Job to finish

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The successCondition for the resource template is incorrect. The status.state field of a Volcano Job is an object, so checking for equality with a string will fail. To check for completion, you should evaluate the phase subfield. The condition should be status.state.phase == 'Completed'. Note that Completed must be a string literal.

Suggested change
successCondition: status.state == Completed # Wait for the Job to finish
successCondition: status.state.phase == 'Completed' # Wait for the Job to finish

Check the logs to verify that the container correctly detects the GPU environment via `nvidia-smi`:

```bash
kubectl logs gpu-sharing-tutorial-completion-task-0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Following the correction to the Job manifest, the pod name used in this kubectl logs command will be incorrect. A Volcano pod's name is constructed as <job-name>-<task-name>-<index>. With the suggested manifest fix (which names the task gpu-task), the correct pod name would be gpu-sharing-tutorial-gpu-task-0.

Suggested change
kubectl logs gpu-sharing-tutorial-completion-task-0
kubectl logs gpu-sharing-tutorial-gpu-task-0

Signed-off-by: vinayak sharma <vinayaks0111@gmail.com>
@JesseStutler
Copy link
Member

Please fix the CI failures thanks
BTW, did your try the tutorial in your cluster and did it work properly? We need to make sure all the examples works fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/documentation Categorizes issue or PR as related to documentation. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants