From 7d4f5904c0d404c326d4449c220c9150da988fa6 Mon Sep 17 00:00:00 2001 From: Shive Date: Sun, 18 Jan 2026 12:23:34 +0530 Subject: [PATCH 1/4] Add missing plugin documentation for resource-strategy-fit, capacity, deviceshare, extender, nodegroup, usage, and resourcequota plugins Signed-off-by: Shive --- content/en/docs/plugins/capacity.md | 181 ++++++++++++ content/en/docs/plugins/deviceshare.md | 194 +++++++++++++ content/en/docs/plugins/extender.md | 257 ++++++++++++++++++ content/en/docs/plugins/nodegroup.md | 241 ++++++++++++++++ .../en/docs/plugins/resource-strategy-fit.md | 175 ++++++++++++ content/en/docs/plugins/resourcequota.md | 213 +++++++++++++++ content/en/docs/plugins/usage.md | 178 ++++++++++++ 7 files changed, 1439 insertions(+) create mode 100644 content/en/docs/plugins/capacity.md create mode 100644 content/en/docs/plugins/deviceshare.md create mode 100644 content/en/docs/plugins/extender.md create mode 100644 content/en/docs/plugins/nodegroup.md create mode 100644 content/en/docs/plugins/resource-strategy-fit.md create mode 100644 content/en/docs/plugins/resourcequota.md create mode 100644 content/en/docs/plugins/usage.md diff --git a/content/en/docs/plugins/capacity.md b/content/en/docs/plugins/capacity.md new file mode 100644 index 00000000..d35a91d3 --- /dev/null +++ b/content/en/docs/plugins/capacity.md @@ -0,0 +1,181 @@ ++++ +title = "Capacity Plugin" + +date = 2025-01-21 +lastmod = 2025-01-21 + +draft = false # Is this a draft? true/false +toc = true # Show table of contents? true/false +type = "docs" # Do not modify. + +# Add menu entry to sidebar. +linktitle = "Capacity" +[menu.docs] + parent = "plugins" + weight = 2 ++++ + +### Capacity + +#### Overview + +The Capacity plugin manages queue resource allocation using a capacity-based model. It enforces queue capacity limits, guarantees minimum resource allocations, and supports hierarchical queue structures. The plugin calculates each queue's deserved resources based on its capacity, guarantee, and the cluster's total available resources. + +#### Features + +- **Queue Capacity Management**: Enforces queue capacity limits based on configured capability +- **Resource Guarantees**: Supports minimum resource guarantees for queues +- **Hierarchical Queues**: Supports hierarchical queue structures with parent-child relationships +- **Dynamic Resource Allocation**: Calculates deserved resources dynamically based on queue configuration +- **Resource Reclamation**: Supports resource reclamation from queues exceeding their capacity +- **Job Enqueue Control**: Validates resource availability before allowing jobs to be enqueued + +#### Configuration + +The Capacity plugin is configured through Queue resources. Here's an example: + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: queue-capacity-example +spec: + weight: 1 + capability: + cpu: "100" + memory: "100Gi" + guarantee: + resource: + cpu: "20" + memory: "20Gi" + deserved: + cpu: "50" + memory: "50Gi" +``` + +##### Queue Configuration Fields + +- **capability**: Maximum resources the queue can consume +- **guarantee**: Minimum resources guaranteed to the queue +- **deserved**: Desired resource allocation for the queue (calculated automatically if not specified) +- **parent**: Parent queue name for hierarchical queue structures + +##### Hierarchical Queue Configuration + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: root-queue +spec: + weight: 1 + capability: + cpu: "1000" + memory: "1000Gi" +--- +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: child-queue +spec: + parent: root-queue + weight: 1 + capability: + cpu: "500" + memory: "500Gi" + guarantee: + resource: + cpu: "100" + memory: "100Gi" +``` + +#### How It Works + +1. **Capacity Calculation**: The plugin calculates each queue's real capacity by considering the total cluster resources, total guarantees, and the queue's own guarantee and capability. +2. **Deserved Resources**: Deserved resources are calculated based on the queue's real capacity and configured deserved values. +3. **Job Enqueue**: Before a job is enqueued, the plugin validates that the queue has sufficient capacity to accommodate the job's minimum resource requirements. +4. **Resource Allocation**: During scheduling, the plugin ensures that queues don't exceed their allocated capacity. +5. **Reclamation**: Queues that exceed their deserved resources can have tasks reclaimed to make room for other queues. + +#### Scenario + +The Capacity plugin is suitable for: + +- **Resource Quota Management**: Enforcing resource limits per queue or department +- **Multi-tenant Clusters**: Isolating resources between different tenants or teams +- **Resource Reservations**: Guaranteeing minimum resources for critical workloads +- **Hierarchical Organizations**: Organizations with nested resource allocation structures + +#### Examples + +##### Example 1: Basic Capacity Management + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: team-a +spec: + weight: 1 + capability: + cpu: "200" + memory: "200Gi" + nvidia.com/gpu: "8" + guarantee: + resource: + cpu: "50" + memory: "50Gi" + nvidia.com/gpu: "2" +``` + +##### Example 2: Hierarchical Capacity + +```yaml +# Root queue +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: root +spec: + weight: 1 + capability: + cpu: "1000" + memory: "1000Gi" + +--- +# Development queue +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: dev +spec: + parent: root + weight: 1 + capability: + cpu: "300" + memory: "300Gi" + +--- +# Production queue +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: prod +spec: + parent: root + weight: 1 + capability: + cpu: "500" + memory: "500Gi" + guarantee: + resource: + cpu: "200" + memory: "200Gi" +``` + +#### Notes + +- When hierarchical queues are enabled, only leaf queues can allocate tasks +- Queues without a capacity configuration are treated as best-effort queues +- The plugin automatically calculates real capacity considering parent queue constraints +- Resource guarantees cannot exceed queue capabilities diff --git a/content/en/docs/plugins/deviceshare.md b/content/en/docs/plugins/deviceshare.md new file mode 100644 index 00000000..6e24dd50 --- /dev/null +++ b/content/en/docs/plugins/deviceshare.md @@ -0,0 +1,194 @@ ++++ +title = "Device Share Plugin" + +date = 2025-01-21 +lastmod = 2025-01-21 + +draft = false # Is this a draft? true/false +toc = true # Show table of contents? true/false +type = "docs" # Do not modify. + +# Add menu entry to sidebar. +linktitle = "Device Share" +[menu.docs] + parent = "plugins" + weight = 3 ++++ + +### Device Share + +#### Overview + +The Device Share plugin manages the sharing and allocation of device resources such as GPUs, NPUs, and other accelerators. It supports multiple device types including NVIDIA GPUs (both GPU sharing and vGPU), Ascend NPUs, and provides flexible scheduling policies for device allocation. The plugin enables efficient utilization of expensive accelerator resources through sharing capabilities. + +#### Features + +- **GPU Sharing**: Enable sharing of GPU resources among multiple pods +- **GPU Number**: Schedule based on the number of GPUs requested +- **vGPU Support**: Support for virtual GPU (vGPU) allocation +- **Ascend NPU Support**: Support for Ascend NPU devices including MindCluster VNPU and HAMi VNPU +- **Node Locking**: Optional node-level locking to prevent concurrent device allocations +- **Flexible Scheduling Policies**: Configurable scoring policies for device allocation +- **Batch Node Scoring**: Support for batch scoring of nodes for NPU devices + +#### Configuration + +The Device Share plugin can be configured with the following arguments: + +```yaml +actions: "allocate, backfill" +tiers: +- plugins: + - name: deviceshare + arguments: + deviceshare.GPUSharingEnable: true + deviceshare.GPUNumberEnable: false + deviceshare.VGPUEnable: false + deviceshare.NodeLockEnable: false + deviceshare.SchedulePolicy: "binpack" + deviceshare.ScheduleWeight: 10 + deviceshare.AscendMindClusterVNPUEnable: false + deviceshare.AscendHAMiVNPUEnable: false + deviceshare.KnownGeometriesCMName: "volcano-vgpu-device-config" + deviceshare.KnownGeometriesCMNamespace: "kube-system" +``` + +##### Configuration Parameters + +- **deviceshare.GPUSharingEnable** (bool): Enable GPU sharing mode +- **deviceshare.GPUNumberEnable** (bool): Enable GPU number-based scheduling (mutually exclusive with GPUSharingEnable) +- **deviceshare.VGPUEnable** (bool): Enable vGPU support (mutually exclusive with GPU sharing) +- **deviceshare.NodeLockEnable** (bool): Enable node-level locking for device allocation +- **deviceshare.SchedulePolicy** (string): Scheduling policy for device scoring (e.g., "binpack", "spread") +- **deviceshare.ScheduleWeight** (int): Weight for device scoring in node ordering +- **deviceshare.AscendMindClusterVNPUEnable** (bool): Enable Ascend MindCluster VNPU support +- **deviceshare.AscendHAMiVNPUEnable** (bool): Enable Ascend HAMi VNPU support +- **deviceshare.KnownGeometriesCMName** (string): ConfigMap name for vGPU geometries +- **deviceshare.KnownGeometriesCMNamespace** (string): Namespace for vGPU geometries ConfigMap + +#### Device Types + +##### NVIDIA GPU Sharing + +Enable GPU sharing to allow multiple pods to share a single GPU: + +```yaml +- name: deviceshare + arguments: + deviceshare.GPUSharingEnable: true + deviceshare.ScheduleWeight: 10 +``` + +Pods request GPU resources using: + +```yaml +resources: + requests: + nvidia.com/gpu: 2 # Request 2 GPU units (out of 100 per GPU) + limits: + nvidia.com/gpu: 2 +``` + +##### NVIDIA GPU Number + +Schedule based on the number of physical GPUs: + +```yaml +- name: deviceshare + arguments: + deviceshare.GPUNumberEnable: true + deviceshare.ScheduleWeight: 10 +``` + +Pods request whole GPUs: + +```yaml +resources: + requests: + nvidia.com/gpu: 1 # Request 1 whole GPU + limits: + nvidia.com/gpu: 1 +``` + +##### vGPU + +Enable virtual GPU support: + +```yaml +- name: deviceshare + arguments: + deviceshare.VGPUEnable: true + deviceshare.ScheduleWeight: 10 + deviceshare.KnownGeometriesCMName: "volcano-vgpu-device-config" + deviceshare.KnownGeometriesCMNamespace: "kube-system" +``` + +##### Ascend NPU + +Enable Ascend NPU support: + +```yaml +- name: deviceshare + arguments: + deviceshare.AscendMindClusterVNPUEnable: true + # or + deviceshare.AscendHAMiVNPUEnable: true + deviceshare.ScheduleWeight: 10 +``` + +#### Scenario + +The Device Share plugin is suitable for: + +- **GPU Clusters**: Clusters with NVIDIA GPU resources requiring efficient sharing +- **AI Training**: Machine learning training workloads requiring GPU acceleration +- **Multi-tenant GPU Sharing**: Environments where multiple users need access to GPU resources +- **NPU Workloads**: Workloads running on Ascend NPU devices +- **Cost Optimization**: Maximizing utilization of expensive accelerator hardware + +#### Examples + +##### Example 1: GPU Sharing for Small Workloads + +Configure GPU sharing for workloads that don't require full GPU resources: + +```yaml +- name: deviceshare + arguments: + deviceshare.GPUSharingEnable: true + deviceshare.SchedulePolicy: "binpack" + deviceshare.ScheduleWeight: 10 +``` + +##### Example 2: Whole GPU Allocation + +Configure for workloads requiring full GPU resources: + +```yaml +- name: deviceshare + arguments: + deviceshare.GPUNumberEnable: true + deviceshare.SchedulePolicy: "spread" + deviceshare.ScheduleWeight: 10 +``` + +##### Example 3: vGPU with Custom ConfigMap + +Configure vGPU with custom geometry configuration: + +```yaml +- name: deviceshare + arguments: + deviceshare.VGPUEnable: true + deviceshare.ScheduleWeight: 10 + deviceshare.KnownGeometriesCMName: "custom-vgpu-config" + deviceshare.KnownGeometriesCMNamespace: "gpu-system" +``` + +#### Notes + +- GPU sharing and GPU number modes are mutually exclusive +- GPU sharing and vGPU cannot be enabled simultaneously +- Node locking prevents race conditions in device allocation +- The plugin automatically registers supported devices based on configuration +- Batch scoring is used for NPU devices to optimize allocation decisions diff --git a/content/en/docs/plugins/extender.md b/content/en/docs/plugins/extender.md new file mode 100644 index 00000000..9e2bf0c4 --- /dev/null +++ b/content/en/docs/plugins/extender.md @@ -0,0 +1,257 @@ ++++ +title = "Extender Plugin" + +date = 2025-01-21 +lastmod = 2025-01-21 + +draft = false # Is this a draft? true/false +toc = true # Show table of contents? true/false +type = "docs" # Do not modify. + +# Add menu entry to sidebar. +linktitle = "Extender" +[menu.docs] + parent = "plugins" + weight = 4 ++++ + +### Extender + +#### Overview + +The Extender plugin enables Volcano scheduler to delegate scheduling decisions to external HTTP services. It allows users to extend Volcano's scheduling capabilities by implementing custom logic in external services. The plugin supports various scheduling hooks including predicate, prioritize, preemptable, reclaimable, and event handlers. + +#### Features + +- **External Service Integration**: Delegate scheduling decisions to external HTTP services +- **Multiple Scheduling Hooks**: Support for predicate, prioritize, preemptable, reclaimable, and other scheduling functions +- **Managed Resources**: Optionally filter tasks based on managed resources +- **Error Handling**: Configurable error handling with ignorable option +- **Event Handlers**: Support for allocate and deallocate event handlers +- **HTTP Timeout Configuration**: Configurable HTTP request timeout + +#### Configuration + +The Extender plugin can be configured with the following arguments: + +```yaml +actions: "reclaim, allocate, backfill, preempt" +tiers: +- plugins: + - name: extender + arguments: + extender.urlPrefix: http://127.0.0.1:8080 + extender.httpTimeout: 100ms + extender.onSessionOpenVerb: onSessionOpen + extender.onSessionCloseVerb: onSessionClose + extender.predicateVerb: predicate + extender.prioritizeVerb: prioritize + extender.preemptableVerb: preemptable + extender.reclaimableVerb: reclaimable + extender.queueOverusedVerb: queueOverused + extender.jobEnqueueableVerb: jobEnqueueable + extender.jobReadyVerb: jobReady + extender.allocateFuncVerb: allocateFunc + extender.deallocateFuncVerb: deallocateFunc + extender.ignorable: true + extender.managedResources: + - nvidia.com/gpu + - nvidia.com/gpumem +``` + +##### Configuration Parameters + +- **extender.urlPrefix** (string): Base URL prefix for the extender service +- **extender.httpTimeout** (string): HTTP request timeout (e.g., "100ms", "1s", "1m") +- **extender.onSessionOpenVerb** (string): Verb for OnSessionOpen method +- **extender.onSessionCloseVerb** (string): Verb for OnSessionClose method +- **extender.predicateVerb** (string): Verb for Predicate method +- **extender.prioritizeVerb** (string): Verb for Prioritize method +- **extender.preemptableVerb** (string): Verb for Preemptable method +- **extender.reclaimableVerb** (string): Verb for Reclaimable method +- **extender.queueOverusedVerb** (string): Verb for QueueOverused method +- **extender.jobEnqueueableVerb** (string): Verb for JobEnqueueable method +- **extender.jobReadyVerb** (string): Verb for JobReady method +- **extender.allocateFuncVerb** (string): Verb for AllocateFunc event handler +- **extender.deallocateFuncVerb** (string): Verb for DeallocateFunc event handler +- **extender.ignorable** (bool): Whether the extender can ignore unexpected errors +- **extender.managedResources** (list): List of resources managed by the extender (comma-separated or list format) + +#### How It Works + +1. **Session Lifecycle**: The extender can hook into session open and close events to initialize and cleanup resources. +2. **Predicate**: The extender can filter nodes based on custom criteria during the predicate phase. +3. **Prioritize**: The extender can score nodes based on custom logic during the prioritize phase. +4. **Preemptable/Reclaimable**: The extender can determine which tasks can be preempted or reclaimed. +5. **Queue Management**: The extender can participate in queue overused and job enqueueable decisions. +6. **Event Handlers**: The extender can receive notifications when tasks are allocated or deallocated. + +#### Managed Resources + +The extender can optionally manage specific resources. When managed resources are configured, the extender is only invoked for tasks that request at least one of the managed resources: + +```yaml +extender.managedResources: +- nvidia.com/gpu +- nvidia.com/gpumem +``` + +If no managed resources are specified, the extender is invoked for all tasks. + +#### Error Handling + +The extender can be configured to ignore errors: + +```yaml +extender.ignorable: true +``` + +When ignorable is set to true, errors from the extender are logged but don't prevent scheduling from continuing. When set to false, errors cause scheduling decisions to fail. + +#### API Contract + +The extender service must implement HTTP POST endpoints for each configured verb. The request body contains JSON-encoded scheduling information, and the response should contain the appropriate scheduling decision. + +##### Example Predicate Request/Response + +**Request:** +```json +{ + "task": { + "namespace": "default", + "name": "task-1", + "resources": { + "cpu": 2, + "memory": 4096 + } + }, + "node": { + "name": "node-1", + "allocatable": { + "cpu": 8, + "memory": 16384 + } + } +} +``` + +**Response:** +```json +{ + "code": 0, + "errorMessage": "" +} +``` + +##### Example Prioritize Request/Response + +**Request:** +```json +{ + "task": { + "namespace": "default", + "name": "task-1", + "resources": { + "cpu": 2, + "memory": 4096 + } + }, + "nodes": [ + { + "name": "node-1", + "allocatable": { + "cpu": 8, + "memory": 16384 + } + }, + { + "name": "node-2", + "allocatable": { + "cpu": 8, + "memory": 16384 + } + } + ] +} +``` + +**Response:** +```json +{ + "nodeScore": { + "node-1": 80.5, + "node-2": 75.2 + }, + "errorMessage": "" +} +``` + +#### Scenario + +The Extender plugin is suitable for: + +- **Custom Scheduling Logic**: Implementing domain-specific scheduling requirements +- **Third-party Integration**: Integrating with external resource management systems +- **Advanced Filtering**: Complex node filtering based on external data sources +- **Custom Scoring**: Custom node scoring algorithms not available in standard plugins +- **Resource-specific Logic**: Handling special resources with custom allocation logic + +#### Examples + +##### Example 1: GPU Extender + +Configure an extender for GPU-specific scheduling: + +```yaml +- name: extender + arguments: + extender.urlPrefix: http://gpu-scheduler:8080 + extender.httpTimeout: 1s + extender.predicateVerb: predicate + extender.prioritizeVerb: prioritize + extender.managedResources: + - nvidia.com/gpu + - nvidia.com/gpumem + extender.ignorable: false +``` + +##### Example 2: Custom Node Filtering + +Configure an extender for custom node filtering: + +```yaml +- name: extender + arguments: + extender.urlPrefix: http://custom-filter:8080 + extender.httpTimeout: 500ms + extender.predicateVerb: customFilter + extender.ignorable: true +``` + +##### Example 3: Full Lifecycle Hooks + +Configure an extender with all lifecycle hooks: + +```yaml +- name: extender + arguments: + extender.urlPrefix: http://full-extender:8080 + extender.httpTimeout: 2s + extender.onSessionOpenVerb: onSessionOpen + extender.onSessionCloseVerb: onSessionClose + extender.predicateVerb: predicate + extender.prioritizeVerb: prioritize + extender.preemptableVerb: preemptable + extender.reclaimableVerb: reclaimable + extender.allocateFuncVerb: allocateFunc + extender.deallocateFuncVerb: deallocateFunc + extender.ignorable: true +``` + +#### Notes + +- The extender service must be accessible from the Volcano scheduler +- HTTP requests use POST method with JSON content type +- Maximum response body size is 10MB +- The extender should return HTTP 200 status code for successful operations +- Error responses should include appropriate error messages in the response body diff --git a/content/en/docs/plugins/nodegroup.md b/content/en/docs/plugins/nodegroup.md new file mode 100644 index 00000000..8d24e4e6 --- /dev/null +++ b/content/en/docs/plugins/nodegroup.md @@ -0,0 +1,241 @@ ++++ +title = "Node Group Plugin" + +date = 2025-01-21 +lastmod = 2025-01-21 + +draft = false # Is this a draft? true/false +toc = true # Show table of contents? true/false +type = "docs" # Do not modify. + +# Add menu entry to sidebar. +linktitle = "Node Group" +[menu.docs] + parent = "plugins" + weight = 5 ++++ + +### Node Group + +#### Overview + +The Node Group plugin provides queue-level node group affinity and anti-affinity capabilities. It allows queues to specify which node groups their jobs should run on, enabling better resource isolation and workload distribution. The plugin supports both required and preferred node group affinity/anti-affinity rules, and can inherit affinity rules from parent queues in hierarchical queue structures. + +#### Features + +- **Queue-level Affinity**: Define node group affinity rules at the queue level +- **Required and Preferred Rules**: Support for both required (hard) and preferred (soft) affinity constraints +- **Anti-affinity Support**: Support for both affinity and anti-affinity rules +- **Hierarchical Inheritance**: Inherit affinity rules from parent queues when hierarchical queues are enabled +- **Node Group Labeling**: Uses node labels to identify node groups +- **Strict Mode**: Configurable strict mode for affinity enforcement + +#### Configuration + +The Node Group plugin can be configured with the following arguments: + +```yaml +actions: "reclaim, allocate, backfill, preempt" +tiers: +- plugins: + - name: nodegroup + arguments: + strict: true +``` + +##### Configuration Parameters + +- **strict** (bool): Enable strict mode. In strict mode, nodes without node group labels are rejected if the queue has affinity rules, and nodes with node group labels are rejected if the queue has no affinity rules. Default is `true`. + +#### Node Group Labeling + +Nodes must be labeled with the node group name using the `volcano.sh/nodegroup-name` label: + +```yaml +apiVersion: v1 +kind: Node +metadata: + name: node-1 + labels: + volcano.sh/nodegroup-name: "group-a" +spec: + # node spec +``` + +#### Queue Configuration + +Queues can specify node group affinity rules in their spec: + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: queue-example +spec: + weight: 1 + affinity: + nodeGroupAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - "group-a" + - "group-b" + preferredDuringSchedulingIgnoredDuringExecution: + - "group-c" + nodeGroupAntiAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - "group-d" + preferredDuringSchedulingIgnoredDuringExecution: + - "group-e" +``` + +##### Queue Affinity Fields + +- **nodeGroupAffinity.requiredDuringSchedulingIgnoredDuringExecution**: Required node groups. Tasks must be scheduled on nodes in one of these groups. +- **nodeGroupAffinity.preferredDuringSchedulingIgnoredDuringExecution**: Preferred node groups. Tasks prefer to be scheduled on nodes in these groups. +- **nodeGroupAntiAffinity.requiredDuringSchedulingIgnoredDuringExecution**: Required anti-affinity groups. Tasks must not be scheduled on nodes in these groups. +- **nodeGroupAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution**: Preferred anti-affinity groups. Tasks prefer not to be scheduled on nodes in these groups. + +#### Hierarchical Queue Support + +When hierarchical queues are enabled, queues without explicit affinity rules inherit affinity rules from their nearest ancestor queue that has affinity rules defined: + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: parent-queue +spec: + weight: 1 + affinity: + nodeGroupAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - "group-a" +--- +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: child-queue +spec: + parent: parent-queue + weight: 1 + # Child queue inherits affinity rules from parent-queue +``` + +#### Scoring + +The plugin provides node scoring based on affinity rules: + +- **Required Affinity**: +100 points +- **Preferred Affinity**: +50 points +- **Preferred Anti-affinity**: -1 points + +#### Scenario + +The Node Group plugin is suitable for: + +- **Resource Isolation**: Isolating workloads to specific node groups for security or compliance reasons +- **Workload Distribution**: Distributing workloads across different node groups +- **Hardware-specific Scheduling**: Scheduling workloads on nodes with specific hardware characteristics +- **Multi-tenant Isolation**: Ensuring tenant workloads run on designated node groups +- **Geographic Distribution**: Scheduling workloads based on geographic location of node groups + +#### Examples + +##### Example 1: Required Affinity + +Configure a queue to require nodes from specific groups: + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: gpu-queue +spec: + weight: 1 + affinity: + nodeGroupAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - "gpu-group-1" + - "gpu-group-2" +``` + +##### Example 2: Preferred Affinity + +Configure a queue to prefer nodes from specific groups: + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: cpu-queue +spec: + weight: 1 + affinity: + nodeGroupAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - "cpu-group-1" + - "cpu-group-2" +``` + +##### Example 3: Anti-affinity + +Configure a queue to avoid certain node groups: + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: production-queue +spec: + weight: 1 + affinity: + nodeGroupAntiAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - "development-group" + preferredDuringSchedulingIgnoredDuringExecution: + - "test-group" +``` + +##### Example 4: Combined Affinity and Anti-affinity + +Configure a queue with both affinity and anti-affinity rules: + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: mixed-queue +spec: + weight: 1 + affinity: + nodeGroupAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - "group-a" + preferredDuringSchedulingIgnoredDuringExecution: + - "group-b" + nodeGroupAntiAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - "group-c" + preferredDuringSchedulingIgnoredDuringExecution: + - "group-d" +``` + +##### Example 5: Non-strict Mode + +Configure the plugin in non-strict mode: + +```yaml +- name: nodegroup + arguments: + strict: false +``` + +In non-strict mode, nodes without node group labels are allowed if the queue has no affinity rules. + +#### Notes + +- Nodes must be labeled with `volcano.sh/nodegroup-name` to participate in node group scheduling +- Required affinity rules are hard constraints and must be satisfied for scheduling +- Preferred affinity rules are soft constraints and affect scoring +- Anti-affinity rules take precedence over affinity rules +- In hierarchical queues, child queues inherit affinity rules from their nearest ancestor with affinity rules +- When hierarchical queues are enabled, set `enableHierarchy: true` in the plugin configuration diff --git a/content/en/docs/plugins/resource-strategy-fit.md b/content/en/docs/plugins/resource-strategy-fit.md new file mode 100644 index 00000000..ca4d9251 --- /dev/null +++ b/content/en/docs/plugins/resource-strategy-fit.md @@ -0,0 +1,175 @@ ++++ +title = "Resource Strategy Fit Plugin" + +date = 2025-01-21 +lastmod = 2025-01-21 + +draft = false # Is this a draft? true/false +toc = true # Show table of contents? true/false +type = "docs" # Do not modify. + +# Add menu entry to sidebar. +linktitle = "Resource Strategy Fit" +[menu.docs] + parent = "plugins" + weight = 1 ++++ + +### Resource Strategy Fit + +#### Overview + +The Resource Strategy Fit plugin provides flexible resource allocation strategies for scheduling tasks onto nodes. It supports multiple scoring strategies including MostAllocated and LeastAllocated for different resource types, enabling administrators to configure custom resource allocation policies. The plugin also supports additional features like SRA (Smart Resource Allocation) and Proportional resource allocation. + +#### Features + +- **Flexible Resource Scoring**: Supports `MostAllocated` and `LeastAllocated` scoring strategies for different resource types +- **Customizable Weights**: Configure weights for each resource type to control their impact on scoring +- **Pod-level Scoring**: Supports pod-level scoring strategy configuration through annotations +- **Wildcard Pattern Matching**: Supports wildcard patterns for resource matching (e.g., `nvidia.com/gpu/*`) +- **SRA Support**: Optional Smart Resource Allocation (SRA) for enhanced resource allocation +- **Proportional Allocation**: Optional proportional resource allocation policy + +#### Configuration + +The Resource Strategy Fit plugin can be configured with the following arguments: + +```yaml +actions: "enqueue, allocate, backfill, reclaim, preempt" +tiers: +- plugins: + - name: resource-strategy-fit + arguments: + resourceStrategyFitWeight: 10 + resources: + nvidia.com/gpu: + type: MostAllocated + weight: 2 + cpu: + type: LeastAllocated + weight: 1 + memory: + type: LeastAllocated + weight: 1 + sra: + enable: true + resources: nvidia.com/gpu + weight: 10 + resourceWeight: + nvidia.com/gpu: 1 + proportional: + enable: false + resources: nvidia.com/gpu + resourceProportion: + nvidia.com/gpu.cpu: 4 + nvidia.com/gpu.memory: 8 +``` + +##### Configuration Parameters + +- **resourceStrategyFitWeight** (int): Global weight for the resource strategy fit plugin. Default is 10. +- **resources** (map): Resource-specific configuration with the following fields: + - **type**: Scoring strategy type (`MostAllocated` or `LeastAllocated`) + - **weight**: Weight for this resource in scoring calculation +- **sra** (optional): SRA configuration: + - **enable**: Enable/disable SRA + - **resources**: Comma-separated list of resources for SRA + - **weight**: Weight for SRA scoring + - **resourceWeight**: Per-resource weights for SRA +- **proportional** (optional): Proportional allocation configuration: + - **enable**: Enable/disable proportional allocation + - **resources**: Comma-separated list of resources + - **resourceProportion**: Proportional ratios for resource combinations + +##### Scoring Strategies + +- **MostAllocated**: Prefers nodes with higher resource utilization. Useful for binpacking scenarios where you want to fill nodes before using new ones. +- **LeastAllocated**: Prefers nodes with lower resource utilization. Useful for spreading workloads across nodes to improve availability. + +##### Pod-level Configuration + +Pods can specify their own scoring strategy using annotations: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + annotations: + volcano.sh/resource-strategy-scoring-type: MostAllocated + volcano.sh/resource-strategy-weight: '{"nvidia.com/gpu": 2, "cpu": 1}' +spec: + containers: + - name: container + resources: + requests: + nvidia.com/gpu: 1 + cpu: "2" +``` + +#### Scenario + +The Resource Strategy Fit plugin is suitable for: + +- **Mixed Workloads**: Clusters with diverse workload types requiring different resource allocation strategies +- **GPU Clusters**: GPU-intensive workloads where GPUs should be allocated using MostAllocated strategy +- **High Availability**: Workloads requiring distribution across nodes using LeastAllocated strategy +- **Custom Allocation Policies**: Organizations with specific resource allocation requirements + +#### Examples + +##### Example 1: GPU Binpacking + +Configure the plugin to use MostAllocated for GPUs to pack GPU workloads on fewer nodes: + +```yaml +- name: resource-strategy-fit + arguments: + resourceStrategyFitWeight: 10 + resources: + nvidia.com/gpu: + type: MostAllocated + weight: 5 + cpu: + type: LeastAllocated + weight: 1 + memory: + type: LeastAllocated + weight: 1 +``` + +##### Example 2: Workload Distribution + +Configure the plugin to distribute workloads evenly across nodes: + +```yaml +- name: resource-strategy-fit + arguments: + resourceStrategyFitWeight: 10 + resources: + cpu: + type: LeastAllocated + weight: 3 + memory: + type: LeastAllocated + weight: 2 +``` + +##### Example 3: With SRA + +Enable SRA for enhanced GPU allocation: + +```yaml +- name: resource-strategy-fit + arguments: + resourceStrategyFitWeight: 10 + resources: + nvidia.com/gpu: + type: MostAllocated + weight: 2 + sra: + enable: true + resources: nvidia.com/gpu + weight: 10 + resourceWeight: + nvidia.com/gpu: 1 +``` diff --git a/content/en/docs/plugins/resourcequota.md b/content/en/docs/plugins/resourcequota.md new file mode 100644 index 00000000..1bf8e4db --- /dev/null +++ b/content/en/docs/plugins/resourcequota.md @@ -0,0 +1,213 @@ ++++ +title = "Resource Quota Plugin" + +date = 2025-01-21 +lastmod = 2025-01-21 + +draft = false # Is this a draft? true/false +toc = true # Show table of contents? true/false +type = "docs" # Do not modify. + +# Add menu entry to sidebar. +linktitle = "Resource Quota" +[menu.docs] + parent = "plugins" + weight = 7 ++++ + +### Resource Quota + +#### Overview + +The Resource Quota plugin enforces Kubernetes ResourceQuota constraints during job enqueue. It ensures that jobs can only be enqueued if their minimum resource requirements do not exceed the available quota in their namespace. The plugin integrates with Kubernetes ResourceQuota objects to provide namespace-level resource limits and isolation. + +#### Features + +- **ResourceQuota Enforcement**: Enforces Kubernetes ResourceQuota constraints during job enqueue +- **Namespace-level Isolation**: Provides resource isolation at the namespace level +- **Pending Resource Tracking**: Tracks pending resources to prevent over-allocation +- **Event Recording**: Records PodGroup events when quota limits are exceeded +- **MinResources Validation**: Validates jobs against their minimum resource requirements + +#### Configuration + +The Resource Quota plugin requires no special configuration. It automatically works with existing Kubernetes ResourceQuota objects: + +```yaml +actions: "enqueue, allocate, backfill" +tiers: +- plugins: + - name: resourcequota +``` + +#### How It Works + +1. **Job Enqueue**: When a job is enqueued, the plugin checks if the job's minimum resource requirements fit within the namespace's ResourceQuota. +2. **Quota Validation**: For each ResourceQuota in the namespace, the plugin: + - Checks if the job's minimum resources plus already used resources plus pending resources exceed the quota hard limits + - If quota is exceeded, the job is rejected from enqueue +3. **Pending Resource Tracking**: The plugin tracks pending resources (jobs that have been accepted for enqueue but not yet allocated) to prevent over-allocation. +4. **Event Recording**: When a job is rejected due to quota limits, the plugin records a PodGroup event with details about the insufficient resources. + +#### ResourceQuota Configuration + +ResourceQuota objects must be created in the target namespace: + +```yaml +apiVersion: v1 +kind: ResourceQuota +metadata: + name: compute-quota + namespace: default +spec: + hard: + requests.cpu: "100" + requests.memory: 200Gi + requests.nvidia.com/gpu: "8" + limits.cpu: "200" + limits.memory: 400Gi + limits.nvidia.com/gpu: "16" + pods: "50" +``` + +#### Job Configuration + +Jobs must specify minimum resources for the quota check to work: + +```yaml +apiVersion: batch.volcano.sh/v1alpha1 +kind: Job +metadata: + name: example-job + namespace: default +spec: + minAvailable: 3 + schedulerName: volcano + queue: default + minResources: + requests: + cpu: "6" + memory: 12Gi + nvidia.com/gpu: "1" + tasks: + - replicas: 3 + name: "task" + template: + spec: + containers: + - name: container + resources: + requests: + cpu: "2" + memory: 4Gi + nvidia.com/gpu: "1" + limits: + cpu: "4" + memory: 8Gi + nvidia.com/gpu: "1" +``` + +#### Scenario + +The Resource Quota plugin is suitable for: + +- **Multi-tenant Clusters**: Enforcing resource limits per namespace/tenant +- **Resource Isolation**: Preventing one namespace from consuming all cluster resources +- **Cost Control**: Limiting resource consumption to control costs +- **Capacity Planning**: Ensuring resource allocation stays within planned capacity +- **Fair Resource Sharing**: Ensuring fair distribution of resources across namespaces + +#### Examples + +##### Example 1: Basic ResourceQuota + +Create a ResourceQuota to limit CPU and memory: + +```yaml +apiVersion: v1 +kind: ResourceQuota +metadata: + name: team-a-quota + namespace: team-a +spec: + hard: + requests.cpu: "50" + requests.memory: 100Gi + limits.cpu: "100" + limits.memory: 200Gi + pods: "20" +``` + +##### Example 2: GPU ResourceQuota + +Create a ResourceQuota with GPU limits: + +```yaml +apiVersion: v1 +kind: ResourceQuota +metadata: + name: gpu-quota + namespace: ml-team +spec: + hard: + requests.cpu: "100" + requests.memory: 200Gi + requests.nvidia.com/gpu: "16" + limits.cpu: "200" + limits.memory: 400Gi + limits.nvidia.com/gpu: "32" +``` + +##### Example 3: Multiple ResourceQuotas + +A namespace can have multiple ResourceQuotas: + +```yaml +# CPU and memory quota +apiVersion: v1 +kind: ResourceQuota +metadata: + name: compute-quota + namespace: default +spec: + hard: + requests.cpu: "100" + requests.memory: 200Gi + +--- +# GPU quota +apiVersion: v1 +kind: ResourceQuota +metadata: + name: gpu-quota + namespace: default +spec: + hard: + requests.nvidia.com/gpu: "8" +``` + +##### Example 4: Pod Limits + +Create a ResourceQuota that limits the number of pods: + +```yaml +apiVersion: v1 +kind: ResourceQuota +metadata: + name: pod-limit-quota + namespace: default +spec: + hard: + pods: "100" +``` + +#### Notes + +- ResourceQuota objects must exist in the namespace before jobs are enqueued +- Jobs must specify `minResources` for the quota check to work +- The plugin checks quota during job enqueue, not during task allocation +- Pending resources are tracked to prevent over-allocation +- If a namespace has no ResourceQuota, jobs can be enqueued without quota checks +- The plugin supports all resource types supported by Kubernetes ResourceQuota +- ResourceQuota scope constraints are not currently supported +- The plugin integrates with Volcano's job enqueue mechanism to provide early quota validation diff --git a/content/en/docs/plugins/usage.md b/content/en/docs/plugins/usage.md new file mode 100644 index 00000000..5a4bdb5d --- /dev/null +++ b/content/en/docs/plugins/usage.md @@ -0,0 +1,178 @@ ++++ +title = "Usage Plugin" + +date = 2025-01-21 +lastmod = 2025-01-21 + +draft = false # Is this a draft? true/false +toc = true # Show table of contents? true/false +type = "docs" # Do not modify. + +# Add menu entry to sidebar. +linktitle = "Usage" +[menu.docs] + parent = "plugins" + weight = 6 ++++ + +### Usage + +#### Overview + +The Usage plugin provides CPU and memory usage-based scheduling. It filters nodes based on resource usage thresholds and scores nodes based on their current resource utilization. The plugin helps prevent scheduling on overloaded nodes and prefers nodes with lower resource usage, improving overall cluster utilization and workload performance. + +#### Features + +- **Usage-based Filtering**: Filter nodes based on CPU and memory usage thresholds +- **Usage-based Scoring**: Score nodes based on current resource utilization +- **Configurable Thresholds**: Set custom thresholds for CPU and memory usage +- **Weighted Scoring**: Configurable weights for usage, CPU, and memory in scoring +- **Predicate Control**: Optional enable/disable predicate filtering +- **Metrics Integration**: Uses node resource usage metrics for decision making + +#### Configuration + +The Usage plugin can be configured with the following arguments: + +```yaml +actions: "enqueue, allocate, backfill" +tiers: +- plugins: + - name: usage + enablePredicate: true + arguments: + usage.weight: 5 + cpu.weight: 1 + memory.weight: 1 + thresholds: + cpu: 80 + mem: 80 +``` + +##### Configuration Parameters + +- **enablePredicate** (bool): Enable/disable predicate filtering. When set to `false`, new pod scheduling is not disabled when the node load reaches the threshold. Default is `true`. +- **usage.weight** (int): Global weight for the usage plugin scoring. Default is `5`. +- **cpu.weight** (int): Weight for CPU usage in scoring calculation. Default is `1`. +- **memory.weight** (int): Weight for memory usage in scoring calculation. Default is `1`. +- **thresholds.cpu** (float): CPU usage threshold percentage. Nodes exceeding this threshold will be filtered out (if predicate is enabled). Default is `80`. +- **thresholds.mem** (float): Memory usage threshold percentage. Nodes exceeding this threshold will be filtered out (if predicate is enabled). Default is `80`. + +#### How It Works + +1. **Metrics Collection**: The plugin uses node resource usage metrics provided by the metrics collector. +2. **Predicate Phase**: If enabled, nodes with CPU or memory usage exceeding the configured thresholds are filtered out. +3. **Scoring Phase**: Nodes are scored based on their current resource utilization. Lower usage results in higher scores. +4. **Scoring Formula**: The score is calculated as: + - CPU score: `(100 - cpuUsage) / 100 * cpuWeight` + - Memory score: `(100 - memoryUsage) / 100 * memoryWeight` + - Combined score: `(cpuScore + memoryScore) / (cpuWeight + memoryWeight) * usageWeight * MaxNodeScore` + +#### Metrics Requirements + +The Usage plugin requires node resource usage metrics to be available. Metrics must be updated within the last 5 minutes to be considered valid. If metrics are not available or are stale, the plugin will: + +- **Predicate**: Allow scheduling (pass the filter) +- **Scoring**: Return a score of 0 + +#### Scenario + +The Usage plugin is suitable for: + +- **Load Balancing**: Distributing workloads across nodes to balance resource utilization +- **Overload Prevention**: Preventing scheduling on overloaded nodes +- **Performance Optimization**: Preferring nodes with lower resource usage for better performance +- **Cost Optimization**: Improving resource utilization across the cluster +- **Workload Distribution**: Ensuring even distribution of workloads based on resource consumption + +#### Examples + +##### Example 1: Basic Usage Configuration + +Configure the plugin with default thresholds: + +```yaml +- name: usage + enablePredicate: true + arguments: + usage.weight: 5 + cpu.weight: 1 + memory.weight: 1 + thresholds: + cpu: 80 + mem: 80 +``` + +##### Example 2: Conservative Thresholds + +Configure stricter thresholds to prevent overload: + +```yaml +- name: usage + enablePredicate: true + arguments: + usage.weight: 10 + cpu.weight: 2 + memory.weight: 2 + thresholds: + cpu: 70 + mem: 70 +``` + +##### Example 3: Scoring Only (No Predicate) + +Disable predicate filtering and use only scoring: + +```yaml +- name: usage + enablePredicate: false + arguments: + usage.weight: 5 + cpu.weight: 1 + memory.weight: 1 + thresholds: + cpu: 80 + mem: 80 +``` + +##### Example 4: CPU-focused Configuration + +Prioritize CPU usage over memory: + +```yaml +- name: usage + enablePredicate: true + arguments: + usage.weight: 10 + cpu.weight: 3 + memory.weight: 1 + thresholds: + cpu: 75 + mem: 85 +``` + +##### Example 5: Memory-focused Configuration + +Prioritize memory usage over CPU: + +```yaml +- name: usage + enablePredicate: true + arguments: + usage.weight: 10 + cpu.weight: 1 + memory.weight: 3 + thresholds: + cpu: 85 + mem: 75 +``` + +#### Notes + +- The plugin requires node resource usage metrics to be available +- Metrics must be updated within the last 5 minutes to be considered valid +- Threshold values are percentages (0-100) +- Weights determine the relative importance of different resources in scoring +- When predicate is disabled, the plugin only affects node scoring, not filtering +- The plugin uses average usage metrics over a configured period +- If metrics are not available, the plugin allows scheduling to proceed From 80ceac3c85a7b8f43c5089a336c037f8c63979c1 Mon Sep 17 00:00:00 2001 From: Shive Date: Sun, 18 Jan 2026 12:36:09 +0530 Subject: [PATCH 2/4] fix the errors Signed-off-by: Shive --- content/en/docs/plugins/capacity.md | 3 +-- content/en/docs/plugins/deviceshare.md | 3 +-- content/en/docs/plugins/extender.md | 3 +-- content/en/docs/plugins/nodegroup.md | 3 +-- content/en/docs/plugins/resource-strategy-fit.md | 3 +-- content/en/docs/plugins/resourcequota.md | 3 +-- content/en/docs/plugins/usage.md | 3 +-- 7 files changed, 7 insertions(+), 14 deletions(-) diff --git a/content/en/docs/plugins/capacity.md b/content/en/docs/plugins/capacity.md index d35a91d3..ba801617 100644 --- a/content/en/docs/plugins/capacity.md +++ b/content/en/docs/plugins/capacity.md @@ -10,8 +10,7 @@ type = "docs" # Do not modify. # Add menu entry to sidebar. linktitle = "Capacity" -[menu.docs] - parent = "plugins" +[menu.plugins] weight = 2 +++ diff --git a/content/en/docs/plugins/deviceshare.md b/content/en/docs/plugins/deviceshare.md index 6e24dd50..b216fd94 100644 --- a/content/en/docs/plugins/deviceshare.md +++ b/content/en/docs/plugins/deviceshare.md @@ -10,8 +10,7 @@ type = "docs" # Do not modify. # Add menu entry to sidebar. linktitle = "Device Share" -[menu.docs] - parent = "plugins" +[menu.plugins] weight = 3 +++ diff --git a/content/en/docs/plugins/extender.md b/content/en/docs/plugins/extender.md index 9e2bf0c4..382f5637 100644 --- a/content/en/docs/plugins/extender.md +++ b/content/en/docs/plugins/extender.md @@ -10,8 +10,7 @@ type = "docs" # Do not modify. # Add menu entry to sidebar. linktitle = "Extender" -[menu.docs] - parent = "plugins" +[menu.plugins] weight = 4 +++ diff --git a/content/en/docs/plugins/nodegroup.md b/content/en/docs/plugins/nodegroup.md index 8d24e4e6..b3eba1d4 100644 --- a/content/en/docs/plugins/nodegroup.md +++ b/content/en/docs/plugins/nodegroup.md @@ -10,8 +10,7 @@ type = "docs" # Do not modify. # Add menu entry to sidebar. linktitle = "Node Group" -[menu.docs] - parent = "plugins" +[menu.plugins] weight = 5 +++ diff --git a/content/en/docs/plugins/resource-strategy-fit.md b/content/en/docs/plugins/resource-strategy-fit.md index ca4d9251..bc2f51e4 100644 --- a/content/en/docs/plugins/resource-strategy-fit.md +++ b/content/en/docs/plugins/resource-strategy-fit.md @@ -10,8 +10,7 @@ type = "docs" # Do not modify. # Add menu entry to sidebar. linktitle = "Resource Strategy Fit" -[menu.docs] - parent = "plugins" +[menu.plugins] weight = 1 +++ diff --git a/content/en/docs/plugins/resourcequota.md b/content/en/docs/plugins/resourcequota.md index 1bf8e4db..07706a78 100644 --- a/content/en/docs/plugins/resourcequota.md +++ b/content/en/docs/plugins/resourcequota.md @@ -10,8 +10,7 @@ type = "docs" # Do not modify. # Add menu entry to sidebar. linktitle = "Resource Quota" -[menu.docs] - parent = "plugins" +[menu.plugins] weight = 7 +++ diff --git a/content/en/docs/plugins/usage.md b/content/en/docs/plugins/usage.md index 5a4bdb5d..0877bf13 100644 --- a/content/en/docs/plugins/usage.md +++ b/content/en/docs/plugins/usage.md @@ -10,8 +10,7 @@ type = "docs" # Do not modify. # Add menu entry to sidebar. linktitle = "Usage" -[menu.docs] - parent = "plugins" +[menu.plugins] weight = 6 +++ From 69de112158f6473bc26cd5a9297d2ff0db727421 Mon Sep 17 00:00:00 2001 From: Shive Date: Wed, 11 Feb 2026 22:11:51 +0530 Subject: [PATCH 3/4] feat: add a starter code for docusaurus Signed-off-by: Shive --- website-docusaurus/.gitignore | 20 + website-docusaurus/README.md | 41 + .../blog/2019-01-28-kube-batch-customers.md | 14 + .../blog/2019-01-28-kube-batch-startup.md | 167 + .../blog/2019-03-28-quick-start-volcano.md | 56 + ...-llm-inference-for-the-cloud-native-era.md | 170 + .../blog/2020-01-01-paddlepaddle-en.md | 548 + .../blog/2020-09-30-aiqiyi-en.md | 224 + website-docusaurus/blog/2020-10-27-hpc-en.md | 254 + .../blog/2020-12-24-leinao-en.md | 137 + .../blog/2021-01-05-ruitian-en.md | 298 + .../blog/2021-05-27-xiaohongshu-en.md | 155 + .../blog/2021-06-01-pengcheng-en.md | 212 + .../blog/2021-06-15-ruitian2-en.md | 254 + .../blog/2021-08-31-1.4-release-en.md | 32 + .../blog/2022-12-28-ing_case-en.md | 106 + .../2023-01-12-volcano-1.7.0-release-en.md | 140 + ...lcano-community-co-construction-program.md | 83 + .../blog/2024-01-31-volcano-1.8.2-release.md | 241 + ...with-volcano-in-ai-&-big-data-scenarios.md | 51 + .../blog/2024-05-21-volcano-1.9.0-release.md | 194 + .../blog/2024-09-29-volcano-1.10.0-release.md | 216 + .../blog/2025-02-07-volcano-1.11.0-release.md | 455 + ...uted-training-and-inference-performance.md | 58 + .../2025-05-30-volcano-2025-security-audit.md | 45 + .../blog/2025-06-12-volcano-1.12.0-release.md | 293 + .../blog/2025-06-13-iflytek_case_study.md | 38 + .../blog/2025-09-29-volcano-1.13.0-release.md | 342 + website-docusaurus/blog/authors.yml | 31 + website-docusaurus/blog/tags.yml | 19 + website-docusaurus/docs/_category_.json | 8 + website-docusaurus/docs/actions.md | 67 + website-docusaurus/docs/architecture.md | 30 + website-docusaurus/docs/cli.md | 61 + website-docusaurus/docs/colocation.md | 604 + website-docusaurus/docs/contribution.md | 151 + website-docusaurus/docs/cron_volcanoJob.md | 133 + website-docusaurus/docs/descheduler.md | 139 + website-docusaurus/docs/flink_on_volcano.md | 264 + website-docusaurus/docs/gpu_virtualization.md | 191 + website-docusaurus/docs/hierarchical_queue.md | 138 + website-docusaurus/docs/installation.md | 106 + website-docusaurus/docs/intro.md | 109 + .../docs/kubeflow_on_volcano.md | 229 + website-docusaurus/docs/membership.md | 123 + .../docs/mindspore_on_volcano.md | 59 + website-docusaurus/docs/mpi_on_volcano.md | 97 + .../docs/multi_cluster_scheduling.md | 41 + .../docs/network_topology_aware_scheduling.md | 448 + website-docusaurus/docs/plugins.md | 160 + website-docusaurus/docs/plugins/capacity.md | 169 + .../docs/plugins/deviceshare.md | 182 + website-docusaurus/docs/plugins/extender.md | 245 + website-docusaurus/docs/plugins/nodegroup.md | 229 + .../docs/plugins/resource-strategy-fit.md | 163 + .../docs/plugins/resourcequota.md | 201 + website-docusaurus/docs/plugins/usage.md | 166 + website-docusaurus/docs/podgroup.md | 105 + website-docusaurus/docs/pp_on_volcano.md | 227 + website-docusaurus/docs/queue.md | 99 + .../docs/queue_resource_management.md | 239 + website-docusaurus/docs/ray_on_volcano.md | 204 + website-docusaurus/docs/referrals.md | 72 + .../docs/schduler_introduction.md | 89 + website-docusaurus/docs/spark_on_volcano.md | 86 + website-docusaurus/docs/tf_on_volcano.md | 96 + website-docusaurus/docs/tutorials.md | 353 + website-docusaurus/docs/unified_scheduling.md | 219 + website-docusaurus/docs/vcjob.md | 312 + website-docusaurus/docusaurus.config.js | 174 + .../2019-01-28-kube-batch-customers.md | 14 + .../2019-01-28-kube-batch-startup.md | 175 + .../2019-03-28-quick-start-volcano.md | 57 + .../2019-11-06-paddlepaddle.md | 589 + .../2020-09-30-aiqiyi.md | 262 + .../2020-10-27-hpc.md | 266 + .../2020-12-24-leinao-cloud-os.md | 133 + .../2021-01-05-ruitian.md | 360 + .../2021-05-27-xiaohongshu.md | 151 + .../2021-06-01-pengcheng.md | 252 + .../2021-06-15-ruitian2.md | 269 + .../2021-08-31-1.4-release.md | 33 + .../2022-12-28-ing_case.md | 104 + .../2023-01-12-volcano-1.7.0-release.md | 141 + ...lcano-community-co-construction-program.md | 90 + .../2024-01-31-volcano-1.8.2-release.md | 242 + .../2024-05-21-volcano-1.9.0-release.md | 196 + .../2024-09-29-volcano-1.10.0-release.md | 213 + .../2025-02-07-volcano-1.11.0-release.md | 442 + .../2025-06-12-volcano-1.12.0-release.md | 292 + .../2025-06-13-iflytek_case_study.md | 39 + .../2025-09-29-volcano-1.13.0-release.md | 342 + .../2025-12-29-introducing_kthena.md | 123 + .../current/actions.md | 68 + .../current/architecture.md | 29 + .../current/cli.md | 56 + .../current/colocation.md | 612 + .../current/contribution.md | 153 + .../current/cron_volcanoJob.md | 133 + .../current/descheduler.md | 139 + .../current/flink_on_volcano.md | 264 + .../current/gpu_virtualization.md | 190 + .../current/hierarchical_queue.md | 134 + .../current/installation.md | 111 + .../current/intro.md | 110 + .../current/kubeflow_on_volcano.md | 236 + .../current/membership.md | 117 + .../current/mindspore_on_volcano.md | 59 + .../current/mpi_on_volcano.md | 97 + .../current/multi_cluster_scheduling.md | 37 + .../network_topology_aware_scheduling.md | 446 + .../current/plugins.md | 161 + .../current/podgroup.md | 135 + .../current/pp_on_volcano.md | 232 + .../current/queue.md | 111 + .../current/queue_resource_management.md | 229 + .../current/ray_on_volcano.md | 204 + .../current/referrals.md | 73 + .../current/schduler_introduction.md | 96 + .../current/spark_on_volcano.md | 86 + .../current/tf_on_volcano.md | 96 + .../current/tutorials.md | 347 + .../current/unified_scheduling.md | 245 + .../current/vcjob.md | 340 + website-docusaurus/netlify.toml | 9 + website-docusaurus/package-lock.json | 18482 ++++++++++++++++ website-docusaurus/package.json | 47 + .../plugins/recent-blog-posts.js | 35 + .../scripts/migrate-hugo-blog.js | 86 + .../scripts/migrate-hugo-docs.js | 71 + website-docusaurus/scripts/migrate-zh-docs.js | 51 + website-docusaurus/sidebars.js | 9 + .../src/components/AboutSection/index.tsx | 35 + .../components/AboutSection/styles.module.css | 75 + .../src/components/FrameworkSupport/index.tsx | 41 + .../FrameworkSupport/styles.module.css | 89 + .../src/components/HeroCarousel/index.tsx | 45 + .../components/HeroCarousel/styles.module.css | 75 + .../src/components/HomepageFeatures/index.tsx | 61 + .../HomepageFeatures/styles.module.css | 11 + .../src/components/RecentPosts/index.tsx | 94 + .../components/RecentPosts/styles.module.css | 151 + .../components/SupportersSection/index.tsx | 47 + .../SupportersSection/styles.module.css | 71 + website-docusaurus/src/css/custom.css | 59 + website-docusaurus/src/pages/index.module.css | 23 + website-docusaurus/src/pages/index.tsx | 23 + website-docusaurus/src/pages/markdown-page.md | 7 + website-docusaurus/static/.nojekyll | 0 website-docusaurus/static/_redirects | 14 + website-docusaurus/static/golang/api.html | 4 + website-docusaurus/static/golang/volcano.html | 4 + website-docusaurus/static/img/.gitkeep | 0 .../co-construction-1.jpg | Bin 0 -> 108713 bytes .../co-construction-2.jpg | Bin 0 -> 38534 bytes .../co-construction-3.jpg | Bin 0 -> 25201 bytes website-docusaurus/static/img/ai1.png | Bin 0 -> 131890 bytes website-docusaurus/static/img/aiqiyi-1.png | Bin 0 -> 94076 bytes website-docusaurus/static/img/aiqiyi-10.png | Bin 0 -> 111369 bytes website-docusaurus/static/img/aiqiyi-11.png | Bin 0 -> 143587 bytes website-docusaurus/static/img/aiqiyi-12.png | Bin 0 -> 69505 bytes website-docusaurus/static/img/aiqiyi-13.png | Bin 0 -> 88975 bytes website-docusaurus/static/img/aiqiyi-2.png | Bin 0 -> 40278 bytes website-docusaurus/static/img/aiqiyi-3.png | Bin 0 -> 42783 bytes website-docusaurus/static/img/aiqiyi-4.png | Bin 0 -> 77001 bytes website-docusaurus/static/img/aiqiyi-5.png | Bin 0 -> 96667 bytes website-docusaurus/static/img/aiqiyi-6.png | Bin 0 -> 128540 bytes website-docusaurus/static/img/aiqiyi-7.png | Bin 0 -> 57361 bytes website-docusaurus/static/img/aiqiyi-8.png | Bin 0 -> 143579 bytes website-docusaurus/static/img/aiqiyi-9.png | Bin 0 -> 84582 bytes website-docusaurus/static/img/aiqiyi-en1.png | Bin 0 -> 57277 bytes website-docusaurus/static/img/aiqiyi-en10.png | Bin 0 -> 61826 bytes website-docusaurus/static/img/aiqiyi-en2.png | Bin 0 -> 17410 bytes website-docusaurus/static/img/aiqiyi-en6.png | Bin 0 -> 58329 bytes website-docusaurus/static/img/arch_1.png | Bin 0 -> 91467 bytes website-docusaurus/static/img/arch_2.PNG | Bin 0 -> 49352 bytes website-docusaurus/static/img/bg_1.png | Bin 0 -> 859082 bytes website-docusaurus/static/img/bg_2.png | Bin 0 -> 368137 bytes website-docusaurus/static/img/cncf-color.svg | 1 + .../static/img/colocation/architecture.png | Bin 0 -> 359648 bytes .../static/img/colocation/cpu-burst1-EN.png | Bin 0 -> 66149 bytes .../static/img/colocation/cpu-burst1.png | Bin 0 -> 99032 bytes .../static/img/colocation/cpu-burst2-EN.png | Bin 0 -> 82649 bytes .../static/img/colocation/cpu-burst2.png | Bin 0 -> 111110 bytes .../static/img/colocation/network.png | Bin 0 -> 19150 bytes .../img/colocation/oversubscription.png | Bin 0 -> 24859 bytes .../img/colocation/oversubscription_EN.png | Bin 0 -> 39577 bytes .../static/img/colocation/watermark.png | Bin 0 -> 497700 bytes website-docusaurus/static/img/deployment.png | Bin 0 -> 47481 bytes .../static/img/descheduler/descheduler-CN.svg | 4 + .../static/img/descheduler/descheduler_EN.svg | 4 + website-docusaurus/static/img/docker-200.png | Bin 0 -> 9244 bytes .../static/img/docusaurus-social-card.jpg | Bin 0 -> 55746 bytes website-docusaurus/static/img/docusaurus.png | Bin 0 -> 5142 bytes website-docusaurus/static/img/drfjob.png | Bin 0 -> 35085 bytes website-docusaurus/static/img/fair-share.png | Bin 0 -> 71755 bytes website-docusaurus/static/img/favicon.ico | Bin 0 -> 15086 bytes .../img/favicons/android-chrome-192x192.png | Bin 0 -> 4947 bytes .../img/favicons/android-chrome-512x512.png | Bin 0 -> 10586 bytes .../static/img/favicons/apple-touch-icon.png | Bin 0 -> 4839 bytes .../static/img/favicons/browserconfig.xml | 13 + .../static/img/favicons/favicon-16x16.png | Bin 0 -> 391 bytes .../static/img/favicons/favicon-32x32.png | Bin 0 -> 920 bytes .../static/img/favicons/favicon.ico | Bin 0 -> 15086 bytes .../static/img/favicons/favicon.png | Bin 0 -> 864 bytes .../static/img/favicons/favicon.svg | 1 + .../static/img/favicons/mstile-144x144.png | Bin 0 -> 3945 bytes .../static/img/favicons/mstile-150x150.png | Bin 0 -> 4368 bytes .../static/img/favicons/mstile-310x150.png | Bin 0 -> 4829 bytes .../static/img/favicons/mstile-310x310.png | Bin 0 -> 7504 bytes .../static/img/favicons/mstile-70x70.png | Bin 0 -> 2904 bytes .../static/img/favicons/safari-pinned-tab.svg | 1 + .../static/img/favicons/site.webmanifest | 19 + website-docusaurus/static/img/gang.png | Bin 0 -> 12212 bytes .../img/gpu-virtualization/hard_limit.jpg | Bin 0 -> 65871 bytes .../vgpu_device_plugin_metrics.png | Bin 0 -> 322404 bytes .../static/img/headers/banner_02.png | Bin 0 -> 1251378 bytes .../static/img/headers/bubbles-wide.jpg | Bin 0 -> 309600 bytes .../static/img/headers/header-apps-2.jpg | Bin 0 -> 73900 bytes .../static/img/headers/header-code.jpg | Bin 0 -> 338646 bytes .../static/img/headers/header-edge-2.jpg | Bin 0 -> 275518 bytes .../static/img/headers/header-k8s.jpg | Bin 0 -> 142573 bytes .../static/img/headers/volcano-slide-1.png | Bin 0 -> 1326 bytes .../static/img/headers/volcano-slide-2.png | Bin 0 -> 3665 bytes .../static/img/hierarchical-queue-example.png | Bin 0 -> 92034 bytes website-docusaurus/static/img/hpc-1.png | Bin 0 -> 21677 bytes website-docusaurus/static/img/hpc-10.png | Bin 0 -> 27328 bytes website-docusaurus/static/img/hpc-2.png | Bin 0 -> 117286 bytes website-docusaurus/static/img/hpc-3.png | Bin 0 -> 63756 bytes website-docusaurus/static/img/hpc-4.png | Bin 0 -> 54371 bytes website-docusaurus/static/img/hpc-5.png | Bin 0 -> 352526 bytes website-docusaurus/static/img/hpc-6.png | Bin 0 -> 14701 bytes website-docusaurus/static/img/hpc-7.png | Bin 0 -> 37812 bytes website-docusaurus/static/img/hpc-8.png | Bin 0 -> 3221 bytes website-docusaurus/static/img/hpc-9.png | Bin 0 -> 34115 bytes website-docusaurus/static/img/hpc-en3.png | Bin 0 -> 25724 bytes website-docusaurus/static/img/hpc-en4.png | Bin 0 -> 46923 bytes website-docusaurus/static/img/hpc-en5.png | Bin 0 -> 136629 bytes website-docusaurus/static/img/hpc-en6.png | Bin 0 -> 6415 bytes website-docusaurus/static/img/hpc-en7.png | Bin 0 -> 8164 bytes website-docusaurus/static/img/icon-192.png | Bin 0 -> 5661 bytes website-docusaurus/static/img/icon.png | Bin 0 -> 5661 bytes website-docusaurus/static/img/icon_data.png | Bin 0 -> 215 bytes website-docusaurus/static/img/icon_data.svg | 1 + website-docusaurus/static/img/icon_email.svg | 1 + website-docusaurus/static/img/icon_emil.png | Bin 0 -> 380 bytes website-docusaurus/static/img/icon_git.png | Bin 0 -> 415 bytes website-docusaurus/static/img/icon_github.svg | 1 + .../static/img/icon_location.png | Bin 0 -> 357 bytes .../static/img/icon_location.svg | 1 + website-docusaurus/static/img/icon_read.png | Bin 0 -> 258 bytes website-docusaurus/static/img/icon_slack.png | Bin 0 -> 462 bytes website-docusaurus/static/img/icon_slack.svg | 1 + website-docusaurus/static/img/icon_time.png | Bin 0 -> 246 bytes website-docusaurus/static/img/icon_time.svg | 1 + .../static/img/icon_twitter.svg | 1 + website-docusaurus/static/img/icon_twtter.png | Bin 0 -> 531 bytes website-docusaurus/static/img/icon_up.png | Bin 0 -> 610 bytes website-docusaurus/static/img/icon_up.svg | 1 + website-docusaurus/static/img/icon_up1.png | Bin 0 -> 648 bytes website-docusaurus/static/img/icon_user.png | Bin 0 -> 204 bytes website-docusaurus/static/img/icon_user.svg | 1 + website-docusaurus/static/img/ing-1.png | Bin 0 -> 52825 bytes website-docusaurus/static/img/ing-10.png | Bin 0 -> 45322 bytes website-docusaurus/static/img/ing-11.png | Bin 0 -> 48962 bytes website-docusaurus/static/img/ing-2.png | Bin 0 -> 72253 bytes website-docusaurus/static/img/ing-3.png | Bin 0 -> 44382 bytes website-docusaurus/static/img/ing-4.png | Bin 0 -> 31360 bytes website-docusaurus/static/img/ing-5.png | Bin 0 -> 34039 bytes website-docusaurus/static/img/ing-6.png | Bin 0 -> 71456 bytes website-docusaurus/static/img/ing-7.png | Bin 0 -> 25639 bytes website-docusaurus/static/img/ing-8.png | Bin 0 -> 40570 bytes website-docusaurus/static/img/ing-9.png | Bin 0 -> 31396 bytes .../static/img/kthena/kthena-arch.svg | 3 + .../static/img/kthena/model-serving.svg | 3 + website-docusaurus/static/img/kube-batch.png | Bin 0 -> 119509 bytes .../static/img/kubecon/2024-paris.png | Bin 0 -> 72876 bytes .../static/img/kubecon/iflytek.jpeg | Bin 0 -> 92353 bytes website-docusaurus/static/img/kubeflow.png | Bin 0 -> 5734 bytes website-docusaurus/static/img/kubeflow.svg | 1 + website-docusaurus/static/img/kubeflow1.png | Bin 0 -> 4743 bytes .../static/img/kubegene_logo.png | Bin 0 -> 5806 bytes .../static/img/kubernetes-200.png | Bin 0 -> 8238 bytes website-docusaurus/static/img/leinao-1.png | Bin 0 -> 244350 bytes website-docusaurus/static/img/leinao-10.png | Bin 0 -> 11484 bytes website-docusaurus/static/img/leinao-11.png | Bin 0 -> 10021 bytes website-docusaurus/static/img/leinao-12.png | Bin 0 -> 15731 bytes website-docusaurus/static/img/leinao-13.png | Bin 0 -> 103550 bytes website-docusaurus/static/img/leinao-2.png | Bin 0 -> 193116 bytes website-docusaurus/static/img/leinao-3.png | Bin 0 -> 267765 bytes website-docusaurus/static/img/leinao-4.png | Bin 0 -> 165427 bytes website-docusaurus/static/img/leinao-5.png | Bin 0 -> 55088 bytes website-docusaurus/static/img/leinao-6.png | Bin 0 -> 19003 bytes website-docusaurus/static/img/leinao-7.png | Bin 0 -> 100979 bytes website-docusaurus/static/img/leinao-8.png | Bin 0 -> 223395 bytes website-docusaurus/static/img/leinao-9.png | Bin 0 -> 10736 bytes website-docusaurus/static/img/leinao-en1.png | Bin 0 -> 59144 bytes website-docusaurus/static/img/leinao-en10.png | Bin 0 -> 10736 bytes website-docusaurus/static/img/leinao-en11.png | Bin 0 -> 11484 bytes website-docusaurus/static/img/leinao-en12.png | Bin 0 -> 8029 bytes website-docusaurus/static/img/leinao-en13.png | Bin 0 -> 13437 bytes website-docusaurus/static/img/leinao-en14.png | Bin 0 -> 103550 bytes website-docusaurus/static/img/leinao-en2.png | Bin 0 -> 8822 bytes website-docusaurus/static/img/leinao-en3.png | Bin 0 -> 8521 bytes website-docusaurus/static/img/leinao-en4.png | Bin 0 -> 6306 bytes website-docusaurus/static/img/leinao-en5.png | Bin 0 -> 11024 bytes website-docusaurus/static/img/leinao-en6.png | Bin 0 -> 16930 bytes website-docusaurus/static/img/leinao-en7.png | Bin 0 -> 10432 bytes website-docusaurus/static/img/leinao-en8.png | Bin 0 -> 47216 bytes website-docusaurus/static/img/leinao-en9.png | Bin 0 -> 223334 bytes website-docusaurus/static/img/logo.svg | 1 + website-docusaurus/static/img/logo_360.png | Bin 0 -> 13089 bytes .../static/img/logo_4paradigm.png | Bin 0 -> 10945 bytes website-docusaurus/static/img/logo_baidu.png | Bin 0 -> 20712 bytes website-docusaurus/static/img/logo_bibdr.png | Bin 0 -> 9738 bytes .../static/img/logo_bilibili.png | Bin 0 -> 13621 bytes .../static/img/logo_bosszhipin.png | Bin 0 -> 5762 bytes website-docusaurus/static/img/logo_boyun.png | Bin 0 -> 17809 bytes .../static/img/logo_cloudnative.png | Bin 0 -> 8417 bytes website-docusaurus/static/img/logo_didi.png | Bin 0 -> 17466 bytes website-docusaurus/static/img/logo_huawei.png | Bin 0 -> 9585 bytes website-docusaurus/static/img/logo_iqiyi.png | Bin 0 -> 19862 bytes website-docusaurus/static/img/logo_jd.png | Bin 0 -> 17277 bytes .../static/img/logo_jianhang.png | Bin 0 -> 11176 bytes .../static/img/logo_jianxinjinke.png | Bin 0 -> 15594 bytes website-docusaurus/static/img/logo_ktnexr.png | Bin 0 -> 7752 bytes website-docusaurus/static/img/logo_middle.png | Bin 0 -> 5994 bytes .../static/img/logo_openinnovation.png | Bin 0 -> 17108 bytes website-docusaurus/static/img/logo_qiezi.png | Bin 0 -> 22701 bytes .../static/img/logo_qvtoutiao.png | Bin 0 -> 25450 bytes .../static/img/logo_redbook.png | Bin 0 -> 11182 bytes .../static/img/logo_replacement.png | Bin 0 -> 99 bytes .../static/img/logo_ruitian.png | Bin 0 -> 16733 bytes .../static/img/logo_tencent.png | Bin 0 -> 9047 bytes website-docusaurus/static/img/logo_vips.png | Bin 0 -> 12553 bytes website-docusaurus/static/img/logo_vivo.png | Bin 0 -> 14861 bytes .../static/img/logo_xiwangzu.png | Bin 0 -> 4514 bytes .../static/img/logo_yunzhisheng.png | Bin 0 -> 20281 bytes .../static/img/logo_zhongkeleinao.png | Bin 0 -> 10655 bytes website-docusaurus/static/img/mpi1.png | Bin 0 -> 26200 bytes .../multi-cluster/volcano_global_design.svg | 4 + .../network-topology/hypernode-example.png | Bin 0 -> 51661 bytes website-docusaurus/static/img/pengcheng-1.png | Bin 0 -> 206323 bytes website-docusaurus/static/img/pengcheng-2.png | Bin 0 -> 207979 bytes website-docusaurus/static/img/pengcheng-3.png | Bin 0 -> 35573 bytes website-docusaurus/static/img/pengcheng-4.png | Bin 0 -> 58837 bytes website-docusaurus/static/img/pengcheng-5.png | Bin 0 -> 84854 bytes website-docusaurus/static/img/pengcheng-6.png | Bin 0 -> 42340 bytes website-docusaurus/static/img/pengcheng-7.png | Bin 0 -> 32245 bytes website-docusaurus/static/img/pengcheng-8.png | Bin 0 -> 158008 bytes .../static/img/pengcheng-en1.png | Bin 0 -> 105425 bytes .../static/img/pengcheng-en2.png | Bin 0 -> 344083 bytes .../static/img/pengcheng-en3.png | Bin 0 -> 12264 bytes .../static/img/pengcheng-en4.png | Bin 0 -> 14286 bytes .../static/img/pengcheng-en5.png | Bin 0 -> 45485 bytes .../static/img/pengcheng-en6.png | Bin 0 -> 7682 bytes website-docusaurus/static/img/ps-worker.png | Bin 0 -> 273554 bytes website-docusaurus/static/img/ray_logo.png | Bin 0 -> 97933 bytes website-docusaurus/static/img/ruitian2-1.png | Bin 0 -> 121232 bytes website-docusaurus/static/img/ruitian2-10.png | Bin 0 -> 243326 bytes website-docusaurus/static/img/ruitian2-11.png | Bin 0 -> 56739 bytes website-docusaurus/static/img/ruitian2-12.png | Bin 0 -> 69719 bytes website-docusaurus/static/img/ruitian2-13.png | Bin 0 -> 83847 bytes website-docusaurus/static/img/ruitian2-14.png | Bin 0 -> 79634 bytes website-docusaurus/static/img/ruitian2-15.png | Bin 0 -> 87129 bytes website-docusaurus/static/img/ruitian2-16.png | Bin 0 -> 83422 bytes website-docusaurus/static/img/ruitian2-17.png | Bin 0 -> 135236 bytes website-docusaurus/static/img/ruitian2-18.png | Bin 0 -> 126192 bytes website-docusaurus/static/img/ruitian2-19.png | Bin 0 -> 91422 bytes website-docusaurus/static/img/ruitian2-2.png | Bin 0 -> 38797 bytes website-docusaurus/static/img/ruitian2-3.png | Bin 0 -> 77393 bytes website-docusaurus/static/img/ruitian2-4.png | Bin 0 -> 120205 bytes website-docusaurus/static/img/ruitian2-5.png | Bin 0 -> 206837 bytes website-docusaurus/static/img/ruitian2-6.png | Bin 0 -> 91819 bytes website-docusaurus/static/img/ruitian2-7.png | Bin 0 -> 70328 bytes website-docusaurus/static/img/ruitian2-8.png | Bin 0 -> 82389 bytes website-docusaurus/static/img/ruitian2-9.png | Bin 0 -> 219853 bytes .../static/img/ruitian2-en2.png | Bin 0 -> 26815 bytes website-docusaurus/static/img/ruitian2.png | Bin 0 -> 79851 bytes website-docusaurus/static/img/ruitian3.png | Bin 0 -> 64783 bytes website-docusaurus/static/img/scheduler.PNG | Bin 0 -> 31528 bytes .../static/img/spark-logo-hd.png | Bin 0 -> 10421 bytes website-docusaurus/static/img/status-DAG.png | Bin 0 -> 83269 bytes website-docusaurus/static/img/task_order.png | Bin 0 -> 10020 bytes .../static/img/undraw_docusaurus_mountain.svg | 171 + .../static/img/undraw_docusaurus_react.svg | 170 + .../static/img/undraw_docusaurus_tree.svg | 40 + .../static/img/v1.10.0/podSchedulingGates.svg | 1 + .../static/img/v1.8.2/jobflow.gif | Bin 0 -> 200997 bytes website-docusaurus/static/img/volcano-hpw.png | Bin 0 -> 80007 bytes .../img/volcano_argo-horizontal-color.png | Bin 0 -> 44849 bytes .../static/img/volcano_flink.PNG | Bin 0 -> 102765 bytes .../static/img/volcano_horovod.PNG | Bin 0 -> 12263 bytes .../static/img/volcano_logo.png | Bin 0 -> 13593 bytes .../static/img/volcano_logo.svg | 1 + .../static/img/volcano_mindspore.PNG | Bin 0 -> 85757 bytes .../static/img/volcano_mxnet.PNG | Bin 0 -> 21685 bytes .../static/img/volcano_openMPI.jpg | Bin 0 -> 10692 bytes .../static/img/volcano_paddle.PNG | Bin 0 -> 97380 bytes .../static/img/volcano_pytorch.PNG | Bin 0 -> 44257 bytes .../static/img/volcano_tensorflow.PNG | Bin 0 -> 70807 bytes .../static/img/xiaohongshu-1.png | Bin 0 -> 94904 bytes .../static/img/xiaohongshu-10.png | Bin 0 -> 50499 bytes .../static/img/xiaohongshu-11.png | Bin 0 -> 64467 bytes .../static/img/xiaohongshu-2.png | Bin 0 -> 143295 bytes .../static/img/xiaohongshu-3.png | Bin 0 -> 83631 bytes .../static/img/xiaohongshu-4.png | Bin 0 -> 48417 bytes .../static/img/xiaohongshu-5.png | Bin 0 -> 65171 bytes .../static/img/xiaohongshu-6.png | Bin 0 -> 87834 bytes .../static/img/xiaohongshu-7.png | Bin 0 -> 66253 bytes .../static/img/xiaohongshu-8.png | Bin 0 -> 81472 bytes .../static/img/xiaohongshu-9.png | Bin 0 -> 111544 bytes .../static/img/xiaohongshu-en1.png | Bin 0 -> 76679 bytes .../static/img/xiaohongshu-en10.png | Bin 0 -> 45631 bytes .../static/img/xiaohongshu-en11.png | Bin 0 -> 56665 bytes .../static/img/xiaohongshu-en2.png | Bin 0 -> 245285 bytes .../static/img/xiaohongshu-en3.png | Bin 0 -> 148290 bytes .../static/img/xiaohongshu-en4.png | Bin 0 -> 56563 bytes .../static/img/xiaohongshu-en5.png | Bin 0 -> 51565 bytes .../static/img/xiaohongshu-en6.png | Bin 0 -> 71841 bytes .../static/img/xiaohongshu-en7.png | Bin 0 -> 43162 bytes .../static/img/xiaohongshu-en8.png | Bin 0 -> 61047 bytes .../static/img/xiaohongshu-en9.png | Bin 0 -> 109427 bytes website-docusaurus/tsconfig.json | 7 + .../versioned_docs/version-v1.10.0/actions.md | 72 + .../version-v1.10.0/architecture.md | 30 + .../versioned_docs/version-v1.10.0/cli.md | 61 + .../version-v1.10.0/contribution.md | 152 + .../version-v1.10.0/flink_on_volcano.md | 264 + .../version-v1.10.0/installation.md | 103 + .../versioned_docs/version-v1.10.0/intro.md | 68 + .../version-v1.10.0/kubeflow_on_volcano.md | 229 + .../version-v1.10.0/membership.md | 123 + .../version-v1.10.0/mindspore_on_volcano.md | 59 + .../version-v1.10.0/mpi_on_volcano.md | 97 + .../versioned_docs/version-v1.10.0/plugins.md | 160 + .../version-v1.10.0/podgroup.md | 91 + .../version-v1.10.0/pp_on_volcano.md | 227 + .../versioned_docs/version-v1.10.0/queue.md | 110 + .../version-v1.10.0/referrals.md | 72 + .../version-v1.10.0/schduler_introduction.md | 89 + .../version-v1.10.0/spark_on_volcano.md | 86 + .../version-v1.10.0/tf_on_volcano.md | 96 + .../version-v1.10.0/tutorials.md | 264 + .../versioned_docs/version-v1.10.0/vcjob.md | 312 + .../versioned_docs/version-v1.11.0/actions.md | 67 + .../version-v1.11.0/architecture.md | 30 + .../versioned_docs/version-v1.11.0/cli.md | 61 + .../version-v1.11.0/colocation.md | 604 + .../version-v1.11.0/contribution.md | 152 + .../version-v1.11.0/descheduler.md | 139 + .../version-v1.11.0/flink_on_volcano.md | 264 + .../version-v1.11.0/gpu_virtualization.md | 143 + .../version-v1.11.0/hierarchical_queue.md | 138 + .../version-v1.11.0/installation.md | 106 + .../versioned_docs/version-v1.11.0/intro.md | 108 + .../version-v1.11.0/kubeflow_on_volcano.md | 229 + .../version-v1.11.0/membership.md | 123 + .../version-v1.11.0/mindspore_on_volcano.md | 59 + .../version-v1.11.0/mpi_on_volcano.md | 97 + .../multi_cluster_scheduling.md | 41 + .../network_topology_aware_scheduling.md | 290 + .../versioned_docs/version-v1.11.0/plugins.md | 160 + .../version-v1.11.0/podgroup.md | 105 + .../version-v1.11.0/pp_on_volcano.md | 227 + .../versioned_docs/version-v1.11.0/queue.md | 99 + .../queue_resource_management.md | 239 + .../version-v1.11.0/referrals.md | 72 + .../version-v1.11.0/schduler_introduction.md | 89 + .../version-v1.11.0/spark_on_volcano.md | 86 + .../version-v1.11.0/tf_on_volcano.md | 96 + .../version-v1.11.0/tutorials.md | 264 + .../version-v1.11.0/unified_scheduling.md | 138 + .../versioned_docs/version-v1.11.0/vcjob.md | 312 + .../versioned_docs/version-v1.12.0/actions.md | 67 + .../version-v1.12.0/architecture.md | 30 + .../versioned_docs/version-v1.12.0/cli.md | 61 + .../version-v1.12.0/colocation.md | 604 + .../version-v1.12.0/contribution.md | 151 + .../version-v1.12.0/descheduler.md | 139 + .../version-v1.12.0/flink_on_volcano.md | 264 + .../version-v1.12.0/gpu_virtualization.md | 206 + .../version-v1.12.0/hierarchical_queue.md | 138 + .../version-v1.12.0/installation.md | 106 + .../versioned_docs/version-v1.12.0/intro.md | 103 + .../version-v1.12.0/kubeflow_on_volcano.md | 229 + .../version-v1.12.0/membership.md | 123 + .../version-v1.12.0/mindspore_on_volcano.md | 59 + .../version-v1.12.0/mpi_on_volcano.md | 97 + .../multi_cluster_scheduling.md | 41 + .../network_topology_aware_scheduling.md | 436 + .../versioned_docs/version-v1.12.0/plugins.md | 160 + .../version-v1.12.0/podgroup.md | 105 + .../version-v1.12.0/pp_on_volcano.md | 227 + .../versioned_docs/version-v1.12.0/queue.md | 99 + .../queue_resource_management.md | 239 + .../version-v1.12.0/referrals.md | 72 + .../version-v1.12.0/schduler_introduction.md | 89 + .../version-v1.12.0/spark_on_volcano.md | 86 + .../version-v1.12.0/tf_on_volcano.md | 96 + .../version-v1.12.0/tutorials.md | 352 + .../version-v1.12.0/unified_scheduling.md | 219 + .../versioned_docs/version-v1.12.0/vcjob.md | 312 + .../versioned_docs/version-v1.7.0/actions.md | 72 + .../version-v1.7.0/architecture.md | 30 + .../versioned_docs/version-v1.7.0/cli.md | 61 + .../version-v1.7.0/contribution.md | 152 + .../version-v1.7.0/flink_on_volcano.md | 264 + .../version-v1.7.0/installation.md | 103 + .../versioned_docs/version-v1.7.0/intro.md | 68 + .../version-v1.7.0/kubeflow_on_volcano.md | 229 + .../version-v1.7.0/membership.md | 123 + .../version-v1.7.0/mindspore_on_volcano.md | 59 + .../version-v1.7.0/mpi_on_volcano.md | 97 + .../versioned_docs/version-v1.7.0/plugins.md | 160 + .../versioned_docs/version-v1.7.0/podgroup.md | 91 + .../version-v1.7.0/pp_on_volcano.md | 227 + .../versioned_docs/version-v1.7.0/queue.md | 110 + .../version-v1.7.0/referrals.md | 72 + .../version-v1.7.0/schduler_introduction.md | 89 + .../version-v1.7.0/spark_on_volcano.md | 86 + .../version-v1.7.0/tf_on_volcano.md | 96 + .../version-v1.7.0/tutorials.md | 264 + .../versioned_docs/version-v1.7.0/vcjob.md | 312 + .../versioned_docs/version-v1.8.2/actions.md | 72 + .../version-v1.8.2/architecture.md | 30 + .../versioned_docs/version-v1.8.2/cli.md | 61 + .../version-v1.8.2/contribution.md | 152 + .../version-v1.8.2/flink_on_volcano.md | 264 + .../version-v1.8.2/installation.md | 103 + .../versioned_docs/version-v1.8.2/intro.md | 68 + .../version-v1.8.2/kubeflow_on_volcano.md | 229 + .../version-v1.8.2/membership.md | 123 + .../version-v1.8.2/mindspore_on_volcano.md | 59 + .../version-v1.8.2/mpi_on_volcano.md | 97 + .../versioned_docs/version-v1.8.2/plugins.md | 160 + .../versioned_docs/version-v1.8.2/podgroup.md | 91 + .../version-v1.8.2/pp_on_volcano.md | 227 + .../versioned_docs/version-v1.8.2/queue.md | 110 + .../version-v1.8.2/referrals.md | 72 + .../version-v1.8.2/schduler_introduction.md | 89 + .../version-v1.8.2/spark_on_volcano.md | 86 + .../version-v1.8.2/tf_on_volcano.md | 96 + .../version-v1.8.2/tutorials.md | 264 + .../versioned_docs/version-v1.8.2/vcjob.md | 312 + .../versioned_docs/version-v1.9.0/actions.md | 72 + .../version-v1.9.0/architecture.md | 30 + .../versioned_docs/version-v1.9.0/cli.md | 61 + .../version-v1.9.0/contribution.md | 152 + .../version-v1.9.0/flink_on_volcano.md | 264 + .../version-v1.9.0/installation.md | 103 + .../versioned_docs/version-v1.9.0/intro.md | 68 + .../version-v1.9.0/kubeflow_on_volcano.md | 229 + .../version-v1.9.0/membership.md | 123 + .../version-v1.9.0/mindspore_on_volcano.md | 59 + .../version-v1.9.0/mpi_on_volcano.md | 97 + .../versioned_docs/version-v1.9.0/plugins.md | 160 + .../versioned_docs/version-v1.9.0/podgroup.md | 91 + .../version-v1.9.0/pp_on_volcano.md | 227 + .../versioned_docs/version-v1.9.0/queue.md | 110 + .../version-v1.9.0/referrals.md | 72 + .../version-v1.9.0/schduler_introduction.md | 89 + .../version-v1.9.0/spark_on_volcano.md | 86 + .../version-v1.9.0/tf_on_volcano.md | 96 + .../version-v1.9.0/tutorials.md | 264 + .../versioned_docs/version-v1.9.0/vcjob.md | 312 + .../version-v1.10.0-sidebars.json | 1 + .../version-v1.11.0-sidebars.json | 1 + .../version-v1.12.0-sidebars.json | 1 + .../version-v1.7.0-sidebars.json | 1 + .../version-v1.8.2-sidebars.json | 1 + .../version-v1.9.0-sidebars.json | 1 + website-docusaurus/versions.json | 1 + 573 files changed, 62501 insertions(+) create mode 100644 website-docusaurus/.gitignore create mode 100644 website-docusaurus/README.md create mode 100644 website-docusaurus/blog/2019-01-28-kube-batch-customers.md create mode 100644 website-docusaurus/blog/2019-01-28-kube-batch-startup.md create mode 100644 website-docusaurus/blog/2019-03-28-quick-start-volcano.md create mode 100644 website-docusaurus/blog/2020-01-01-introducing-kthena-redefining-llm-inference-for-the-cloud-native-era.md create mode 100644 website-docusaurus/blog/2020-01-01-paddlepaddle-en.md create mode 100644 website-docusaurus/blog/2020-09-30-aiqiyi-en.md create mode 100644 website-docusaurus/blog/2020-10-27-hpc-en.md create mode 100644 website-docusaurus/blog/2020-12-24-leinao-en.md create mode 100644 website-docusaurus/blog/2021-01-05-ruitian-en.md create mode 100644 website-docusaurus/blog/2021-05-27-xiaohongshu-en.md create mode 100644 website-docusaurus/blog/2021-06-01-pengcheng-en.md create mode 100644 website-docusaurus/blog/2021-06-15-ruitian2-en.md create mode 100644 website-docusaurus/blog/2021-08-31-1.4-release-en.md create mode 100644 website-docusaurus/blog/2022-12-28-ing_case-en.md create mode 100644 website-docusaurus/blog/2023-01-12-volcano-1.7.0-release-en.md create mode 100644 website-docusaurus/blog/2023-08-11-volcano-community-co-construction-program.md create mode 100644 website-docusaurus/blog/2024-01-31-volcano-1.8.2-release.md create mode 100644 website-docusaurus/blog/2024-03-08-meet-cloud-native-batch-computing-with-volcano-in-ai-&-big-data-scenarios.md create mode 100644 website-docusaurus/blog/2024-05-21-volcano-1.9.0-release.md create mode 100644 website-docusaurus/blog/2024-09-29-volcano-1.10.0-release.md create mode 100644 website-docusaurus/blog/2025-02-07-volcano-1.11.0-release.md create mode 100644 website-docusaurus/blog/2025-04-01-how-volcano-boosts-distributed-training-and-inference-performance.md create mode 100644 website-docusaurus/blog/2025-05-30-volcano-2025-security-audit.md create mode 100644 website-docusaurus/blog/2025-06-12-volcano-1.12.0-release.md create mode 100644 website-docusaurus/blog/2025-06-13-iflytek_case_study.md create mode 100644 website-docusaurus/blog/2025-09-29-volcano-1.13.0-release.md create mode 100644 website-docusaurus/blog/authors.yml create mode 100644 website-docusaurus/blog/tags.yml create mode 100644 website-docusaurus/docs/_category_.json create mode 100644 website-docusaurus/docs/actions.md create mode 100644 website-docusaurus/docs/architecture.md create mode 100644 website-docusaurus/docs/cli.md create mode 100644 website-docusaurus/docs/colocation.md create mode 100644 website-docusaurus/docs/contribution.md create mode 100644 website-docusaurus/docs/cron_volcanoJob.md create mode 100644 website-docusaurus/docs/descheduler.md create mode 100644 website-docusaurus/docs/flink_on_volcano.md create mode 100644 website-docusaurus/docs/gpu_virtualization.md create mode 100644 website-docusaurus/docs/hierarchical_queue.md create mode 100644 website-docusaurus/docs/installation.md create mode 100644 website-docusaurus/docs/intro.md create mode 100644 website-docusaurus/docs/kubeflow_on_volcano.md create mode 100644 website-docusaurus/docs/membership.md create mode 100644 website-docusaurus/docs/mindspore_on_volcano.md create mode 100644 website-docusaurus/docs/mpi_on_volcano.md create mode 100644 website-docusaurus/docs/multi_cluster_scheduling.md create mode 100644 website-docusaurus/docs/network_topology_aware_scheduling.md create mode 100644 website-docusaurus/docs/plugins.md create mode 100644 website-docusaurus/docs/plugins/capacity.md create mode 100644 website-docusaurus/docs/plugins/deviceshare.md create mode 100644 website-docusaurus/docs/plugins/extender.md create mode 100644 website-docusaurus/docs/plugins/nodegroup.md create mode 100644 website-docusaurus/docs/plugins/resource-strategy-fit.md create mode 100644 website-docusaurus/docs/plugins/resourcequota.md create mode 100644 website-docusaurus/docs/plugins/usage.md create mode 100644 website-docusaurus/docs/podgroup.md create mode 100644 website-docusaurus/docs/pp_on_volcano.md create mode 100644 website-docusaurus/docs/queue.md create mode 100644 website-docusaurus/docs/queue_resource_management.md create mode 100644 website-docusaurus/docs/ray_on_volcano.md create mode 100644 website-docusaurus/docs/referrals.md create mode 100644 website-docusaurus/docs/schduler_introduction.md create mode 100644 website-docusaurus/docs/spark_on_volcano.md create mode 100644 website-docusaurus/docs/tf_on_volcano.md create mode 100644 website-docusaurus/docs/tutorials.md create mode 100644 website-docusaurus/docs/unified_scheduling.md create mode 100644 website-docusaurus/docs/vcjob.md create mode 100644 website-docusaurus/docusaurus.config.js create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2019-01-28-kube-batch-customers.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2019-01-28-kube-batch-startup.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2019-03-28-quick-start-volcano.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2019-11-06-paddlepaddle.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2020-09-30-aiqiyi.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2020-10-27-hpc.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2020-12-24-leinao-cloud-os.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2021-01-05-ruitian.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2021-05-27-xiaohongshu.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2021-06-01-pengcheng.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2021-06-15-ruitian2.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2021-08-31-1.4-release.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2022-12-28-ing_case.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2023-01-12-volcano-1.7.0-release.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2023-08-11-volcano-community-co-construction-program.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2024-01-31-volcano-1.8.2-release.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2024-05-21-volcano-1.9.0-release.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2024-09-29-volcano-1.10.0-release.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2025-02-07-volcano-1.11.0-release.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2025-06-12-volcano-1.12.0-release.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2025-06-13-iflytek_case_study.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2025-09-29-volcano-1.13.0-release.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-blog/2025-12-29-introducing_kthena.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/actions.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/architecture.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/cli.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/colocation.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/contribution.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/cron_volcanoJob.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/descheduler.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/flink_on_volcano.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/gpu_virtualization.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/hierarchical_queue.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/installation.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/intro.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/kubeflow_on_volcano.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/membership.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/mindspore_on_volcano.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/mpi_on_volcano.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/multi_cluster_scheduling.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/network_topology_aware_scheduling.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/plugins.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/podgroup.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/pp_on_volcano.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/queue.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/queue_resource_management.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/ray_on_volcano.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/referrals.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/schduler_introduction.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/spark_on_volcano.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/tf_on_volcano.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/tutorials.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/unified_scheduling.md create mode 100644 website-docusaurus/i18n/zh/docusaurus-plugin-content-docs/current/vcjob.md create mode 100644 website-docusaurus/netlify.toml create mode 100644 website-docusaurus/package-lock.json create mode 100644 website-docusaurus/package.json create mode 100644 website-docusaurus/plugins/recent-blog-posts.js create mode 100644 website-docusaurus/scripts/migrate-hugo-blog.js create mode 100644 website-docusaurus/scripts/migrate-hugo-docs.js create mode 100644 website-docusaurus/scripts/migrate-zh-docs.js create mode 100644 website-docusaurus/sidebars.js create mode 100644 website-docusaurus/src/components/AboutSection/index.tsx create mode 100644 website-docusaurus/src/components/AboutSection/styles.module.css create mode 100644 website-docusaurus/src/components/FrameworkSupport/index.tsx create mode 100644 website-docusaurus/src/components/FrameworkSupport/styles.module.css create mode 100644 website-docusaurus/src/components/HeroCarousel/index.tsx create mode 100644 website-docusaurus/src/components/HeroCarousel/styles.module.css create mode 100644 website-docusaurus/src/components/HomepageFeatures/index.tsx create mode 100644 website-docusaurus/src/components/HomepageFeatures/styles.module.css create mode 100644 website-docusaurus/src/components/RecentPosts/index.tsx create mode 100644 website-docusaurus/src/components/RecentPosts/styles.module.css create mode 100644 website-docusaurus/src/components/SupportersSection/index.tsx create mode 100644 website-docusaurus/src/components/SupportersSection/styles.module.css create mode 100644 website-docusaurus/src/css/custom.css create mode 100644 website-docusaurus/src/pages/index.module.css create mode 100644 website-docusaurus/src/pages/index.tsx create mode 100644 website-docusaurus/src/pages/markdown-page.md create mode 100644 website-docusaurus/static/.nojekyll create mode 100644 website-docusaurus/static/_redirects create mode 100644 website-docusaurus/static/golang/api.html create mode 100644 website-docusaurus/static/golang/volcano.html create mode 100644 website-docusaurus/static/img/.gitkeep create mode 100644 website-docusaurus/static/img/Volcano community co-construction program/co-construction-1.jpg create mode 100644 website-docusaurus/static/img/Volcano community co-construction program/co-construction-2.jpg create mode 100644 website-docusaurus/static/img/Volcano community co-construction program/co-construction-3.jpg create mode 100644 website-docusaurus/static/img/ai1.png create mode 100644 website-docusaurus/static/img/aiqiyi-1.png create mode 100644 website-docusaurus/static/img/aiqiyi-10.png create mode 100644 website-docusaurus/static/img/aiqiyi-11.png create mode 100644 website-docusaurus/static/img/aiqiyi-12.png create mode 100644 website-docusaurus/static/img/aiqiyi-13.png create mode 100644 website-docusaurus/static/img/aiqiyi-2.png create mode 100644 website-docusaurus/static/img/aiqiyi-3.png create mode 100644 website-docusaurus/static/img/aiqiyi-4.png create mode 100644 website-docusaurus/static/img/aiqiyi-5.png create mode 100644 website-docusaurus/static/img/aiqiyi-6.png create mode 100644 website-docusaurus/static/img/aiqiyi-7.png create mode 100644 website-docusaurus/static/img/aiqiyi-8.png create mode 100644 website-docusaurus/static/img/aiqiyi-9.png create mode 100644 website-docusaurus/static/img/aiqiyi-en1.png create mode 100644 website-docusaurus/static/img/aiqiyi-en10.png create mode 100644 website-docusaurus/static/img/aiqiyi-en2.png create mode 100644 website-docusaurus/static/img/aiqiyi-en6.png create mode 100644 website-docusaurus/static/img/arch_1.png create mode 100644 website-docusaurus/static/img/arch_2.PNG create mode 100644 website-docusaurus/static/img/bg_1.png create mode 100644 website-docusaurus/static/img/bg_2.png create mode 100644 website-docusaurus/static/img/cncf-color.svg create mode 100644 website-docusaurus/static/img/colocation/architecture.png create mode 100644 website-docusaurus/static/img/colocation/cpu-burst1-EN.png create mode 100644 website-docusaurus/static/img/colocation/cpu-burst1.png create mode 100644 website-docusaurus/static/img/colocation/cpu-burst2-EN.png create mode 100644 website-docusaurus/static/img/colocation/cpu-burst2.png create mode 100644 website-docusaurus/static/img/colocation/network.png create mode 100644 website-docusaurus/static/img/colocation/oversubscription.png create mode 100644 website-docusaurus/static/img/colocation/oversubscription_EN.png create mode 100644 website-docusaurus/static/img/colocation/watermark.png create mode 100644 website-docusaurus/static/img/deployment.png create mode 100644 website-docusaurus/static/img/descheduler/descheduler-CN.svg create mode 100644 website-docusaurus/static/img/descheduler/descheduler_EN.svg create mode 100644 website-docusaurus/static/img/docker-200.png create mode 100644 website-docusaurus/static/img/docusaurus-social-card.jpg create mode 100644 website-docusaurus/static/img/docusaurus.png create mode 100755 website-docusaurus/static/img/drfjob.png create mode 100644 website-docusaurus/static/img/fair-share.png create mode 100644 website-docusaurus/static/img/favicon.ico create mode 100644 website-docusaurus/static/img/favicons/android-chrome-192x192.png create mode 100644 website-docusaurus/static/img/favicons/android-chrome-512x512.png create mode 100644 website-docusaurus/static/img/favicons/apple-touch-icon.png create mode 100644 website-docusaurus/static/img/favicons/browserconfig.xml create mode 100644 website-docusaurus/static/img/favicons/favicon-16x16.png create mode 100644 website-docusaurus/static/img/favicons/favicon-32x32.png create mode 100644 website-docusaurus/static/img/favicons/favicon.ico create mode 100644 website-docusaurus/static/img/favicons/favicon.png create mode 100644 website-docusaurus/static/img/favicons/favicon.svg create mode 100644 website-docusaurus/static/img/favicons/mstile-144x144.png create mode 100644 website-docusaurus/static/img/favicons/mstile-150x150.png create mode 100644 website-docusaurus/static/img/favicons/mstile-310x150.png create mode 100644 website-docusaurus/static/img/favicons/mstile-310x310.png create mode 100644 website-docusaurus/static/img/favicons/mstile-70x70.png create mode 100644 website-docusaurus/static/img/favicons/safari-pinned-tab.svg create mode 100644 website-docusaurus/static/img/favicons/site.webmanifest create mode 100755 website-docusaurus/static/img/gang.png create mode 100644 website-docusaurus/static/img/gpu-virtualization/hard_limit.jpg create mode 100644 website-docusaurus/static/img/gpu-virtualization/vgpu_device_plugin_metrics.png create mode 100644 website-docusaurus/static/img/headers/banner_02.png create mode 100644 website-docusaurus/static/img/headers/bubbles-wide.jpg create mode 100644 website-docusaurus/static/img/headers/header-apps-2.jpg create mode 100644 website-docusaurus/static/img/headers/header-code.jpg create mode 100644 website-docusaurus/static/img/headers/header-edge-2.jpg create mode 100644 website-docusaurus/static/img/headers/header-k8s.jpg create mode 100644 website-docusaurus/static/img/headers/volcano-slide-1.png create mode 100644 website-docusaurus/static/img/headers/volcano-slide-2.png create mode 100644 website-docusaurus/static/img/hierarchical-queue-example.png create mode 100644 website-docusaurus/static/img/hpc-1.png create mode 100644 website-docusaurus/static/img/hpc-10.png create mode 100644 website-docusaurus/static/img/hpc-2.png create mode 100644 website-docusaurus/static/img/hpc-3.png create mode 100644 website-docusaurus/static/img/hpc-4.png create mode 100644 website-docusaurus/static/img/hpc-5.png create mode 100644 website-docusaurus/static/img/hpc-6.png create mode 100644 website-docusaurus/static/img/hpc-7.png create mode 100644 website-docusaurus/static/img/hpc-8.png create mode 100644 website-docusaurus/static/img/hpc-9.png create mode 100644 website-docusaurus/static/img/hpc-en3.png create mode 100644 website-docusaurus/static/img/hpc-en4.png create mode 100644 website-docusaurus/static/img/hpc-en5.png create mode 100644 website-docusaurus/static/img/hpc-en6.png create mode 100644 website-docusaurus/static/img/hpc-en7.png create mode 100644 website-docusaurus/static/img/icon-192.png create mode 100644 website-docusaurus/static/img/icon.png create mode 100644 website-docusaurus/static/img/icon_data.png create mode 100644 website-docusaurus/static/img/icon_data.svg create mode 100644 website-docusaurus/static/img/icon_email.svg create mode 100644 website-docusaurus/static/img/icon_emil.png create mode 100644 website-docusaurus/static/img/icon_git.png create mode 100644 website-docusaurus/static/img/icon_github.svg create mode 100644 website-docusaurus/static/img/icon_location.png create mode 100644 website-docusaurus/static/img/icon_location.svg create mode 100644 website-docusaurus/static/img/icon_read.png create mode 100644 website-docusaurus/static/img/icon_slack.png create mode 100644 website-docusaurus/static/img/icon_slack.svg create mode 100644 website-docusaurus/static/img/icon_time.png create mode 100644 website-docusaurus/static/img/icon_time.svg create mode 100644 website-docusaurus/static/img/icon_twitter.svg create mode 100644 website-docusaurus/static/img/icon_twtter.png create mode 100644 website-docusaurus/static/img/icon_up.png create mode 100644 website-docusaurus/static/img/icon_up.svg create mode 100644 website-docusaurus/static/img/icon_up1.png create mode 100644 website-docusaurus/static/img/icon_user.png create mode 100644 website-docusaurus/static/img/icon_user.svg create mode 100644 website-docusaurus/static/img/ing-1.png create mode 100644 website-docusaurus/static/img/ing-10.png create mode 100644 website-docusaurus/static/img/ing-11.png create mode 100644 website-docusaurus/static/img/ing-2.png create mode 100644 website-docusaurus/static/img/ing-3.png create mode 100644 website-docusaurus/static/img/ing-4.png create mode 100644 website-docusaurus/static/img/ing-5.png create mode 100644 website-docusaurus/static/img/ing-6.png create mode 100644 website-docusaurus/static/img/ing-7.png create mode 100644 website-docusaurus/static/img/ing-8.png create mode 100644 website-docusaurus/static/img/ing-9.png create mode 100644 website-docusaurus/static/img/kthena/kthena-arch.svg create mode 100644 website-docusaurus/static/img/kthena/model-serving.svg create mode 100644 website-docusaurus/static/img/kube-batch.png create mode 100644 website-docusaurus/static/img/kubecon/2024-paris.png create mode 100644 website-docusaurus/static/img/kubecon/iflytek.jpeg create mode 100644 website-docusaurus/static/img/kubeflow.png create mode 100644 website-docusaurus/static/img/kubeflow.svg create mode 100644 website-docusaurus/static/img/kubeflow1.png create mode 100644 website-docusaurus/static/img/kubegene_logo.png create mode 100644 website-docusaurus/static/img/kubernetes-200.png create mode 100644 website-docusaurus/static/img/leinao-1.png create mode 100644 website-docusaurus/static/img/leinao-10.png create mode 100644 website-docusaurus/static/img/leinao-11.png create mode 100644 website-docusaurus/static/img/leinao-12.png create mode 100644 website-docusaurus/static/img/leinao-13.png create mode 100644 website-docusaurus/static/img/leinao-2.png create mode 100644 website-docusaurus/static/img/leinao-3.png create mode 100644 website-docusaurus/static/img/leinao-4.png create mode 100644 website-docusaurus/static/img/leinao-5.png create mode 100644 website-docusaurus/static/img/leinao-6.png create mode 100644 website-docusaurus/static/img/leinao-7.png create mode 100644 website-docusaurus/static/img/leinao-8.png create mode 100644 website-docusaurus/static/img/leinao-9.png create mode 100644 website-docusaurus/static/img/leinao-en1.png create mode 100644 website-docusaurus/static/img/leinao-en10.png create mode 100644 website-docusaurus/static/img/leinao-en11.png create mode 100644 website-docusaurus/static/img/leinao-en12.png create mode 100644 website-docusaurus/static/img/leinao-en13.png create mode 100644 website-docusaurus/static/img/leinao-en14.png create mode 100644 website-docusaurus/static/img/leinao-en2.png create mode 100644 website-docusaurus/static/img/leinao-en3.png create mode 100644 website-docusaurus/static/img/leinao-en4.png create mode 100644 website-docusaurus/static/img/leinao-en5.png create mode 100644 website-docusaurus/static/img/leinao-en6.png create mode 100644 website-docusaurus/static/img/leinao-en7.png create mode 100644 website-docusaurus/static/img/leinao-en8.png create mode 100644 website-docusaurus/static/img/leinao-en9.png create mode 100644 website-docusaurus/static/img/logo.svg create mode 100644 website-docusaurus/static/img/logo_360.png create mode 100644 website-docusaurus/static/img/logo_4paradigm.png create mode 100644 website-docusaurus/static/img/logo_baidu.png create mode 100644 website-docusaurus/static/img/logo_bibdr.png create mode 100644 website-docusaurus/static/img/logo_bilibili.png create mode 100644 website-docusaurus/static/img/logo_bosszhipin.png create mode 100644 website-docusaurus/static/img/logo_boyun.png create mode 100644 website-docusaurus/static/img/logo_cloudnative.png create mode 100644 website-docusaurus/static/img/logo_didi.png create mode 100644 website-docusaurus/static/img/logo_huawei.png create mode 100644 website-docusaurus/static/img/logo_iqiyi.png create mode 100644 website-docusaurus/static/img/logo_jd.png create mode 100644 website-docusaurus/static/img/logo_jianhang.png create mode 100644 website-docusaurus/static/img/logo_jianxinjinke.png create mode 100644 website-docusaurus/static/img/logo_ktnexr.png create mode 100644 website-docusaurus/static/img/logo_middle.png create mode 100644 website-docusaurus/static/img/logo_openinnovation.png create mode 100644 website-docusaurus/static/img/logo_qiezi.png create mode 100644 website-docusaurus/static/img/logo_qvtoutiao.png create mode 100644 website-docusaurus/static/img/logo_redbook.png create mode 100644 website-docusaurus/static/img/logo_replacement.png create mode 100644 website-docusaurus/static/img/logo_ruitian.png create mode 100644 website-docusaurus/static/img/logo_tencent.png create mode 100644 website-docusaurus/static/img/logo_vips.png create mode 100644 website-docusaurus/static/img/logo_vivo.png create mode 100644 website-docusaurus/static/img/logo_xiwangzu.png create mode 100644 website-docusaurus/static/img/logo_yunzhisheng.png create mode 100644 website-docusaurus/static/img/logo_zhongkeleinao.png create mode 100644 website-docusaurus/static/img/mpi1.png create mode 100644 website-docusaurus/static/img/multi-cluster/volcano_global_design.svg create mode 100644 website-docusaurus/static/img/network-topology/hypernode-example.png create mode 100644 website-docusaurus/static/img/pengcheng-1.png create mode 100644 website-docusaurus/static/img/pengcheng-2.png create mode 100644 website-docusaurus/static/img/pengcheng-3.png create mode 100644 website-docusaurus/static/img/pengcheng-4.png create mode 100644 website-docusaurus/static/img/pengcheng-5.png create mode 100644 website-docusaurus/static/img/pengcheng-6.png create mode 100644 website-docusaurus/static/img/pengcheng-7.png create mode 100644 website-docusaurus/static/img/pengcheng-8.png create mode 100644 website-docusaurus/static/img/pengcheng-en1.png create mode 100644 website-docusaurus/static/img/pengcheng-en2.png create mode 100644 website-docusaurus/static/img/pengcheng-en3.png create mode 100644 website-docusaurus/static/img/pengcheng-en4.png create mode 100644 website-docusaurus/static/img/pengcheng-en5.png create mode 100644 website-docusaurus/static/img/pengcheng-en6.png create mode 100644 website-docusaurus/static/img/ps-worker.png create mode 100644 website-docusaurus/static/img/ray_logo.png create mode 100644 website-docusaurus/static/img/ruitian2-1.png create mode 100644 website-docusaurus/static/img/ruitian2-10.png create mode 100644 website-docusaurus/static/img/ruitian2-11.png create mode 100644 website-docusaurus/static/img/ruitian2-12.png create mode 100644 website-docusaurus/static/img/ruitian2-13.png create mode 100644 website-docusaurus/static/img/ruitian2-14.png create mode 100644 website-docusaurus/static/img/ruitian2-15.png create mode 100644 website-docusaurus/static/img/ruitian2-16.png create mode 100644 website-docusaurus/static/img/ruitian2-17.png create mode 100644 website-docusaurus/static/img/ruitian2-18.png create mode 100644 website-docusaurus/static/img/ruitian2-19.png create mode 100644 website-docusaurus/static/img/ruitian2-2.png create mode 100644 website-docusaurus/static/img/ruitian2-3.png create mode 100644 website-docusaurus/static/img/ruitian2-4.png create mode 100644 website-docusaurus/static/img/ruitian2-5.png create mode 100644 website-docusaurus/static/img/ruitian2-6.png create mode 100644 website-docusaurus/static/img/ruitian2-7.png create mode 100644 website-docusaurus/static/img/ruitian2-8.png create mode 100644 website-docusaurus/static/img/ruitian2-9.png create mode 100644 website-docusaurus/static/img/ruitian2-en2.png create mode 100644 website-docusaurus/static/img/ruitian2.png create mode 100644 website-docusaurus/static/img/ruitian3.png create mode 100644 website-docusaurus/static/img/scheduler.PNG create mode 100644 website-docusaurus/static/img/spark-logo-hd.png create mode 100644 website-docusaurus/static/img/status-DAG.png create mode 100644 website-docusaurus/static/img/task_order.png create mode 100644 website-docusaurus/static/img/undraw_docusaurus_mountain.svg create mode 100644 website-docusaurus/static/img/undraw_docusaurus_react.svg create mode 100644 website-docusaurus/static/img/undraw_docusaurus_tree.svg create mode 100644 website-docusaurus/static/img/v1.10.0/podSchedulingGates.svg create mode 100644 website-docusaurus/static/img/v1.8.2/jobflow.gif create mode 100644 website-docusaurus/static/img/volcano-hpw.png create mode 100644 website-docusaurus/static/img/volcano_argo-horizontal-color.png create mode 100644 website-docusaurus/static/img/volcano_flink.PNG create mode 100644 website-docusaurus/static/img/volcano_horovod.PNG create mode 100644 website-docusaurus/static/img/volcano_logo.png create mode 100644 website-docusaurus/static/img/volcano_logo.svg create mode 100644 website-docusaurus/static/img/volcano_mindspore.PNG create mode 100644 website-docusaurus/static/img/volcano_mxnet.PNG create mode 100644 website-docusaurus/static/img/volcano_openMPI.jpg create mode 100644 website-docusaurus/static/img/volcano_paddle.PNG create mode 100644 website-docusaurus/static/img/volcano_pytorch.PNG create mode 100644 website-docusaurus/static/img/volcano_tensorflow.PNG create mode 100644 website-docusaurus/static/img/xiaohongshu-1.png create mode 100644 website-docusaurus/static/img/xiaohongshu-10.png create mode 100644 website-docusaurus/static/img/xiaohongshu-11.png create mode 100644 website-docusaurus/static/img/xiaohongshu-2.png create mode 100644 website-docusaurus/static/img/xiaohongshu-3.png create mode 100644 website-docusaurus/static/img/xiaohongshu-4.png create mode 100644 website-docusaurus/static/img/xiaohongshu-5.png create mode 100644 website-docusaurus/static/img/xiaohongshu-6.png create mode 100644 website-docusaurus/static/img/xiaohongshu-7.png create mode 100644 website-docusaurus/static/img/xiaohongshu-8.png create mode 100644 website-docusaurus/static/img/xiaohongshu-9.png create mode 100644 website-docusaurus/static/img/xiaohongshu-en1.png create mode 100644 website-docusaurus/static/img/xiaohongshu-en10.png create mode 100644 website-docusaurus/static/img/xiaohongshu-en11.png create mode 100644 website-docusaurus/static/img/xiaohongshu-en2.png create mode 100644 website-docusaurus/static/img/xiaohongshu-en3.png create mode 100644 website-docusaurus/static/img/xiaohongshu-en4.png create mode 100644 website-docusaurus/static/img/xiaohongshu-en5.png create mode 100644 website-docusaurus/static/img/xiaohongshu-en6.png create mode 100644 website-docusaurus/static/img/xiaohongshu-en7.png create mode 100644 website-docusaurus/static/img/xiaohongshu-en8.png create mode 100644 website-docusaurus/static/img/xiaohongshu-en9.png create mode 100644 website-docusaurus/tsconfig.json create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/actions.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/architecture.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/cli.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/contribution.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/flink_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/installation.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/intro.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/kubeflow_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/membership.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/mindspore_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/mpi_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/plugins.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/podgroup.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/pp_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/queue.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/referrals.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/schduler_introduction.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/spark_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/tf_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/tutorials.md create mode 100644 website-docusaurus/versioned_docs/version-v1.10.0/vcjob.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/actions.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/architecture.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/cli.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/colocation.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/contribution.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/descheduler.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/flink_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/gpu_virtualization.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/hierarchical_queue.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/installation.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/intro.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/kubeflow_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/membership.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/mindspore_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/mpi_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/multi_cluster_scheduling.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/network_topology_aware_scheduling.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/plugins.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/podgroup.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/pp_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/queue.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/queue_resource_management.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/referrals.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/schduler_introduction.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/spark_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/tf_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/tutorials.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/unified_scheduling.md create mode 100644 website-docusaurus/versioned_docs/version-v1.11.0/vcjob.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/actions.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/architecture.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/cli.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/colocation.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/contribution.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/descheduler.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/flink_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/gpu_virtualization.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/hierarchical_queue.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/installation.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/intro.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/kubeflow_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/membership.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/mindspore_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/mpi_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/multi_cluster_scheduling.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/network_topology_aware_scheduling.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/plugins.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/podgroup.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/pp_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/queue.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/queue_resource_management.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/referrals.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/schduler_introduction.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/spark_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/tf_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/tutorials.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/unified_scheduling.md create mode 100644 website-docusaurus/versioned_docs/version-v1.12.0/vcjob.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/actions.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/architecture.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/cli.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/contribution.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/flink_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/installation.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/intro.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/kubeflow_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/membership.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/mindspore_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/mpi_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/plugins.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/podgroup.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/pp_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/queue.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/referrals.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/schduler_introduction.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/spark_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/tf_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/tutorials.md create mode 100644 website-docusaurus/versioned_docs/version-v1.7.0/vcjob.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/actions.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/architecture.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/cli.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/contribution.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/flink_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/installation.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/intro.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/kubeflow_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/membership.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/mindspore_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/mpi_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/plugins.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/podgroup.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/pp_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/queue.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/referrals.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/schduler_introduction.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/spark_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/tf_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/tutorials.md create mode 100644 website-docusaurus/versioned_docs/version-v1.8.2/vcjob.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/actions.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/architecture.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/cli.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/contribution.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/flink_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/installation.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/intro.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/kubeflow_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/membership.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/mindspore_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/mpi_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/plugins.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/podgroup.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/pp_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/queue.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/referrals.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/schduler_introduction.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/spark_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/tf_on_volcano.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/tutorials.md create mode 100644 website-docusaurus/versioned_docs/version-v1.9.0/vcjob.md create mode 100644 website-docusaurus/versioned_sidebars/version-v1.10.0-sidebars.json create mode 100644 website-docusaurus/versioned_sidebars/version-v1.11.0-sidebars.json create mode 100644 website-docusaurus/versioned_sidebars/version-v1.12.0-sidebars.json create mode 100644 website-docusaurus/versioned_sidebars/version-v1.7.0-sidebars.json create mode 100644 website-docusaurus/versioned_sidebars/version-v1.8.2-sidebars.json create mode 100644 website-docusaurus/versioned_sidebars/version-v1.9.0-sidebars.json create mode 100644 website-docusaurus/versions.json diff --git a/website-docusaurus/.gitignore b/website-docusaurus/.gitignore new file mode 100644 index 00000000..b2d6de30 --- /dev/null +++ b/website-docusaurus/.gitignore @@ -0,0 +1,20 @@ +# Dependencies +/node_modules + +# Production +/build + +# Generated files +.docusaurus +.cache-loader + +# Misc +.DS_Store +.env.local +.env.development.local +.env.test.local +.env.production.local + +npm-debug.log* +yarn-debug.log* +yarn-error.log* diff --git a/website-docusaurus/README.md b/website-docusaurus/README.md new file mode 100644 index 00000000..b28211a9 --- /dev/null +++ b/website-docusaurus/README.md @@ -0,0 +1,41 @@ +# Website + +This website is built using [Docusaurus](https://docusaurus.io/), a modern static website generator. + +## Installation + +```bash +yarn +``` + +## Local Development + +```bash +yarn start +``` + +This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server. + +## Build + +```bash +yarn build +``` + +This command generates static content into the `build` directory and can be served using any static contents hosting service. + +## Deployment + +Using SSH: + +```bash +USE_SSH=true yarn deploy +``` + +Not using SSH: + +```bash +GIT_USER= yarn deploy +``` + +If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the `gh-pages` branch. diff --git a/website-docusaurus/blog/2019-01-28-kube-batch-customers.md b/website-docusaurus/blog/2019-01-28-kube-batch-customers.md new file mode 100644 index 00000000..f3f79646 --- /dev/null +++ b/website-docusaurus/blog/2019-01-28-kube-batch-customers.md @@ -0,0 +1,14 @@ +--- +title: "Customers for Kube-Batch" +description: "A Batch Scheduler for Kubernetes" +date: 2019-01-28 +authors: [volcano] +--- +## Who is Using kube-batch? + +| Organization | Contact (Github UserName) | Environment | Description| +|--------------------------------------| ------------- | ------------- | ------------- | +| [Baidu Inc](https://www.baidu.com) |[@tizhou86](https://github.com/tizhou86)| Production | The scheduler for offline training of the PaddlePaddle deep learning framework | +| [Tusimple](https://www.tusimple.com) | [@suleisl2000](https://github.com/suleisl2000) | | The scheduler for offline training of MXNet | +| [FfDL](https://github.com/IBM/FfDL) | [@animeshsingh](https://github.com/animeshsingh)| | | +| [MOGU Inc](https://www.mogujie.com) | [@jiaxuanzhou](https://github.com/jiaxuanzhou)| Production | The scheduler for offline training of tiny+ | \ No newline at end of file diff --git a/website-docusaurus/blog/2019-01-28-kube-batch-startup.md b/website-docusaurus/blog/2019-01-28-kube-batch-startup.md new file mode 100644 index 00000000..c2cfd454 --- /dev/null +++ b/website-docusaurus/blog/2019-01-28-kube-batch-startup.md @@ -0,0 +1,167 @@ +--- +title: "Bringing Up Kube-Batch" +description: "A Batch Scheduler for Kubernetes" +date: 2019-01-28 +authors: [volcano] +--- +# Tutorial of kube-batch + +This document describes how to run `kube-batch` as a batch scheduler for Kubernetes. To get the complete code, go to [master](https://github.com/kubernetes-sigs/kube-batch/tree/master). + +## 1. Prerequisites +Before running `kube-batch`, you must start up a Kubernetes cluster (see [Creating a Cluster with Kubeadm](https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/)). To complete local testing and deployment, you can use Minikube (see [Running Kubernetes Locally with Minikube](https://kubernetes.io/docs/getting-started-guides/minikube/). You can also use [kind](https://github.com/kubernetes-sigs/kind) to run local Kubernetes clusters with Docker container "nodes". + +`kube-batch` needs to be run as a Kubernetes scheduler. The following sections describe how to run `kube-batch` as a Kubernetes scheduler quickly. For details, see [Configure Multiple Schedulers](https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/). + +## 2. Configuring kube-batch for Kubernetes + +### (1) kube-batch image + +You can download the official kube-batch image from [DockerHub](https://hub.docker.com/r/kubesigs/kube-batch/). The latest version is `v0.4`. + +```bash +# docker pull kubesigs/kube-batch:v0.4 +``` + +### (2) Creating a Kubernetes Deployment for kube-batch + +#### Downloading kube-batch + +```bash +# mkdir -p $GOPATH/src/github.com/kubernetes-sigs/ +# cd $GOPATH/src/github.com/kubernetes-sigs/ +# git clone https://github.com/kubernetes-sigs/kube-batch +``` + +#### Deploying `kube-batch` with Helm + +Run `kube-batch` as a Kubernetes scheduler. + +```bash +# helm install $GOPATH/src/github.com/kubernetes-sigs/kube-batch/deployment/kube-batch --namespace kube-system +``` + +Check the version of `kube-batch`. + +```bash +# helm list +NAME REVISION UPDATED STATUS CHART NAMESPACE +dozing-otter 1 Thu Jun 14 18:52:15 2018 DEPLOYED kube-batch-0.4.0 kube-system +``` + +NOTE: `kube-batch` needs to collect cluster information (such as pods, nodes, and CRDs) for scheduling, so the service account used by the Deployment must have permission to access those cluster resources. Otherwise, `kube-batch` cannot start up. If you are not familiar with Kubernetes RBAC, please copy example/role.yaml to `$GOPATH/src/github.com/kubernetes-sigs/kube-batch/deployment/kube-batch/templates/` and reinstall `kube-batch`. + +### (3) Creating a Job + +Create a file named `job-01.yaml` with the following content: + +```yaml +apiVersion: batch/v1 +kind: Job +metadata: + name: qj-1 +spec: + backoffLimit: 6 + completions: 6 + parallelism: 6 + template: + metadata: + annotations: + scheduling.k8s.io/group-name: qj-1 + spec: + containers: + - image: busybox + imagePullPolicy: IfNotPresent + name: busybox + resources: + requests: + cpu: "1" + restartPolicy: Never + schedulerName: kube-batch +--- +apiVersion: scheduling.incubator.k8s.io/v1alpha1 +kind: PodGroup +metadata: + name: qj-1 +spec: + minMember: 6 +``` + +The YAML file represents a job named `qj-01`, which is used to create 6 pods (specified by `parallelism`). These pods will be scheduled by the scheduler `kube-batch` (specified by `schedulerName`). `kube-batch` watches the `PodGroup` and the annotation `scheduling.k8s.io/group-name`. The annotation identifies to which group a pod belongs to. `kube-batch` will start `.spec.minMember` pods for a job at the same time. If resources are insufficient, `kube-batch` will not start any pods for the job. + +Create a job. + +```bash +# kubectl create -f job-01.yaml +``` + +Check the job status. + +```bash +# kubectl get jobs +NAME DESIRED SUCCESSFUL AGE +qj-1 6 6 2h +``` + +Check the pod statuses. + +```bash +# kubectl get pod --all-namespaces +``` + + +## 4. Creating a PriorityClass for Pods + +The `kube-batch` scheduler starts pods by their priority in the same QueueJob. Pods with a higher priority are started first. The following example demonstrates how to use `PriorityClass`: + +Create a file named `priority_1000.yaml` with the following content: + +```yaml +apiVersion: scheduling.k8s.io/v1beta1 +kind: PriorityClass +metadata: + name: high-priority + namespace: batch-ns01 +value: 1000 +``` + +Create a PriorityClass with priority 1000. + +``` +# kubectl create -f priority_1000.yaml +``` + +Create a pod configuration file named `pod-config-ns01-r01.yaml`. + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: pod-ns01-r01 +spec: + containers: + - name: key-value-store + image: redis + resources: + limits: + memory: "1Gi" + cpu: "1" + requests: + memory: "1Gi" + cpu: "1" + ports: + - containerPort: 6379 + priorityClassName: high-priority +``` + +Create a pod with priority 1000. + +``` +# kubectl create -f pod-config-ns01-r01.yaml +``` + + +NOTE: + +* `PriorityClass` is supported only by Kubernetes 1.9 or later. +* Pods in the same Deployment, RS, or job share the same pod template, so they have the same `PriorityClass`. To specify a different `PriorityClass` for each pod in the same QueueJob, please create a new controller. \ No newline at end of file diff --git a/website-docusaurus/blog/2019-03-28-quick-start-volcano.md b/website-docusaurus/blog/2019-03-28-quick-start-volcano.md new file mode 100644 index 00000000..fb6c3345 --- /dev/null +++ b/website-docusaurus/blog/2019-03-28-quick-start-volcano.md @@ -0,0 +1,56 @@ +--- +title: "Quick Start Guide for Volcano" +description: "Bring up the Volcano in any K8s Cluster within few mins" +date: 2019-03-28 +authors: [volcano] +--- +# Quick Start Guide +The easiest way to deploy Volcano is using Helm charts. +### Preparation +Clone the repository to a local path: +``` +# mkdir -p $GOPATH/src/volcano.sh/ +# cd $GOPATH/src/volcano.sh/ +# git clone https://github.com/volcano-sh/volcano.git +``` +### 1. Volcano Images +Official images are available on [DockerHub](https://hub.docker.com/u/volcanosh). You can also build local images by running the following command: +``` +cd $GOPATH/src/volcano.sh/volcano +make images +## Verify your images. +# docker images +REPOSITORY TAG IMAGE ID CREATED SIZE +volcanosh/vk-admission latest a83338506638 8 seconds ago 41.4MB +volcanosh/vk-scheduler latest faa3c2a25ac3 9 seconds ago 49.6MB +volcanosh/vk-controllers latest 7b11606ebfb8 10 seconds ago 44.2MB +``` +**NOTE**: Ensure that the images are correctly loaded to your Kubernetes cluster. For example, if you are using [kind luster](https://github.com/kubernetes-sigs/kind), run the ```kind load docker-image : ``` command for each image. +### 2. Helm Charts + +Install Helm charts. +``` +helm install installer/chart --namespace --name +For eg : +helm install installer/chart --namespace volcano-trial --name volcano-trial +``` + +Run the following commands to verify the installation: +``` +#1. Verify whether pods are running normally. + +# kubectl get pods --namespace +NAME READY STATUS RESTARTS AGE +-admission-84fd9b9dd8-9trxn 1/1 Running 0 43s +-controllers-75dcc8ff89-42v6r 1/1 Running 0 43s +-scheduler-b94cdb867-89pm2 1/1 Running 0 43s +--admission-init-qbtmb 0/1 Completed 0 43s + +#2. Verify the Services. +# kubectl get services --namespace +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +-admission-service ClusterIP 10.105.78.53 443/TCP 91s +``` +#### Watch the following video to learn how to deploy Volcano: + + \ No newline at end of file diff --git a/website-docusaurus/blog/2020-01-01-introducing-kthena-redefining-llm-inference-for-the-cloud-native-era.md b/website-docusaurus/blog/2020-01-01-introducing-kthena-redefining-llm-inference-for-the-cloud-native-era.md new file mode 100644 index 00000000..45c45dad --- /dev/null +++ b/website-docusaurus/blog/2020-01-01-introducing-kthena-redefining-llm-inference-for-the-cloud-native-era.md @@ -0,0 +1,170 @@ ++++ +title = "Introducing Kthena: Redefining LLM Inference for the Cloud-Native Era" +description = "Kthena is a Kubernetes-native, high-performance LLM inference routing and orchestration system. It improves GPU/NPU utilization and reduces latency with topology-aware scheduling, KV Cache-aware routing, and Prefill-Decode disaggregation." +subtitle = "" + +date = 2026-01-06 +lastmod = 2026-01-06 +datemonth = "Jan" +dateyear = "2026" +dateday = 6 + +draft = false +toc = true +type = "posts" +authors = ["volcano"] + +tags = ["Practice"] +summary = "Kthena is a Kubernetes-native LLM inference routing and orchestration layer for production-scale serving." + +linktitle = "Introducing Kthena: Redefining LLM Inference for the Cloud-Native Era" +[menu.posts] +parent = "tutorials" +weight = 5 ++++ + +Today, the [Volcano](https://volcano.sh/) community is proud to announce the launch of [Kthena](https://github.com/volcano-sh/kthena), a new sub-project designed for global developers and MLOps engineers. + +Kthena is a cloud-native, high-performance system for LLM inference routing, orchestration, and scheduling, tailored specifically for Kubernetes. Engineered to address the complexity of serving LLMs at production scale, Kthena delivers granular control and enhanced flexibility. Through features like topology-aware scheduling, KV Cache-aware routing, and Prefill-Decode (PD) disaggregation, it significantly improves GPU/NPU utilization and throughput while minimizing latency. + +As a sub-project of Volcano, Kthena extends Volcano’s capabilities beyond AI training, creating a unified, end-to-end solution for the entire AI lifecycle. + +## The "Last Mile" Challenge of LLM Serving + +While Large Language Models (LLMs) are reshaping industries, deploying them efficiently on Kubernetes remains a complex systems engineering challenge. Developers face four critical hurdles: + +1. **Low Resource Utilization:** The dynamic memory footprint of LLM inference—especially the KV Cache—creates massive pressure on GPU/NPU resources. Traditional Round-Robin load balancers fail to perceive these characteristics, leading to a mix of idle resources and queued requests that drives up costs. +2. **The Latency vs. Throughput Trade-off:** Inference consists of two distinct phases: Prefill (compute-intensive) and Decode (memory-bound). Coupled scheduling limits optimization. While PD Disaggregation is the industry standard solution, efficient routing and scheduling for it remain difficult. +3. **Complex Multi-Model Management:** Enterprises often serve multiple models, versions, and LoRA adapters simultaneously. Implementing fair scheduling, priority management, and dynamic routing is difficult, leading some to resort to rigid 1:1 mappings between AI Gateways and models. +4. **Lack of Native K8s Integration:** Many existing solutions are either fragmented from the Kubernetes ecosystem or too complex for standard platform operations. + +## Kthena: The Intelligent Brain for Cloud-Native Inference + +Kthena was built to conquer these challenges. Rather than replacing existing inference engines (like vLLM or SGLang), Kthena acts as an intelligent orchestration layer atop them, deeply integrated into Kubernetes. + +
{{
}} +
+ +Kthena consists of two core components: + +* **Kthena Router:** A high-performance, multi-model router that acts as the entry point for all inference requests. It intelligently distributes traffic to backend ModelServers based on ModelRoute rules. +* **Kthena Controller Manager:** The control plane responsible for workload orchestration and lifecycle management. It reconciles Custom Resource Definitions (CRDs) like ModelBooster, ModelServing, and AutoScalingPolicy to convert declarative intent into runtime resources. + * It orchestrates ServingGroups and roles (Prefill/Decode). + * It handles topology-aware affinity, Gang scheduling, rolling updates, and failure recovery. + * It drives elastic scaling based on defined policies. + +## Core Features and Advantages + +### 1. Production-Grade Inference Orchestration (ModelServing) + +
{{
}} +
+ +Kthena introduces a Hierarchical Workload Architecture (ModelServing -> ServingGroup -> Role). + +* **Unified API:** A single API supports diverse patterns, from standalone deployments to complex PD Disaggregation and Expert Parallelism (EP). +* **Simplified Management:** For example, a massive PD deployment is managed as a single ModelServing resource containing multiple ServingGroups. +* **Native PD Disaggregation:** Kthena optimizes hardware usage by routing compute-intensive Prefill tasks to high-compute nodes and memory-bound Decode tasks to High Bandwidth Memory (HBM) nodes. It supports independent scaling to dynamically adjust the Prefill/Decode ratio. +* **Topology Awareness & Gang Scheduling:** Gang scheduling guarantees that pods in a ServingGroup are scheduled as an atomic unit, preventing deadlocks. Topology awareness minimizes data transmission latency by placing related pods closer together in the network fabric. + +### 2. Out-of-the-Box Deployment (ModelBooster) + +* **Templates:** Provides built-in templates for mainstream models (including PD separation), automatically generating necessary routing and lifecycle resources. +* **Flexibility:** Covers general scenarios while allowing granular control via ModelServing for complex needs. + +### 3. Intelligent, Model-Aware Routing + +* **Multi-Model Routing:** OpenAI API compatible. Routes traffic based on headers or body content. +* **Pluggable Algorithms:** Includes Least Request, Least Latency, KV Cache Awareness, Prefix Cache Awareness, LoRA Affinity, and Fairness Scheduling. +* **LoRA Hot-Swapping:** Detects loaded LoRA adapters for non-disruptive hot-swapping and routing. +* **Traffic Governance:** Supports canary releases, token-level rate limiting, and failover. +* **All-in-One Architecture:** Eliminates the need for a separate Envoy Gateway by natively handling routing logic. + +### 4. Cost-Driven Autoscaling + +* **Homogeneous Scaling:** Scales precisely based on business metrics (CPU/GPU/Memory/Custom). +* **Heterogeneous Optimization:** Optimizes resource allocation across different accelerators based on a "Cost-Performance" ratio. + +### 5. Broad Hardware & Engine Support + +* **Inference Engines:** Supports vLLM, SGLang, Triton/TGI, and more via a unified API abstraction. +* **Heterogeneous Compute:** Enables co-location of GPU and NPU resources to balance cost and Service Level Objectives (SLOs). + +### 6. Built-in Flow Control & Fairness + +* **Fairness Scheduling:** Prioritizes traffic based on usage history to prevent "starvation" of low-priority users. +* **Flow Control:** Granular limits based on user, model, and token length. + +## Performance Benchmarks + +In scenarios with long system prompts (e.g., 4096 tokens), Kthena's "KV Cache Awareness + Least Request" strategy delivers significant gains compared to a random baseline: + +* **Throughput:** Increased by **~2.73x** +* **TTFT (Time To First Token):** Reduced by **~73.5%** +* **End-to-End Latency:** Reduced by **>60%** + +| Plugin Configuration | Throughput (req/s) | TTFT (s) | E2E Latency (s) | +| :---- | :---- | :---- | :---- | +| **Least Request + KVCacheAware** | **32.22** | **9.22** | **0.57** | +| Least Request + Prefix Cache | 23.87 | 12.47 | 0.83 | +| Random | 11.81 | 25.23 | 2.15 | + +*Note: While gaps narrow with short prompts, KV Cache awareness offers decisive advantages for multi-turn conversations and template-heavy workloads.* + +## Community & Industry Support + +Kthena has already attracted widespread attention and support from industry leaders since its very beginning. + +"Open source is the lifeblood of technical innovation and the primary driver of industry standardization. As the initiator of Volcano, Huawei Cloud is proud to launch Kthena alongside our community partners. + +This release marks not only a significant milestone in Volcano's technical evolution but also underscores Huawei Cloud's enduring commitment to Cloud Native AI. By deeply integrating with infrastructure like Huawei Cloud CCE and CCI, Kthena unlocks the full potential of diverse computing power—including Ascend—delivering superior cost-efficiency to our customers. + +Through Kthena, we look forward to collaborating with global developers to build an open, thriving ecosystem that lays a robust foundation for the intelligent transformation of industries worldwide." + +
—— Xiaobo Qi, Director of General Computing Services, Huawei Cloud
+ +"Kthena further solidifies Volcano's leadership in intelligent workload scheduling. By leveraging Volcano's unified scheduling and resource pooling capabilities, our platform addresses diverse compute requirements—spanning general-purpose computing, AI training, and inference—within a single, unified framework. + +This enables dynamic resource allocation across different scenarios, effectively eliminating resource silos. Looking ahead, we are excited to combine Kthena with Volcano’s elastic scaling and Volcano Global’s cross-cluster scheduling to drive resource utilization to new heights." + +
—— Lei Yang, PaaS R&D Director, China Telecom AI
+ +"Since its inception, Volcano has evolved in lockstep with the community to address diverse AI scenarios, establishing a comprehensive ecosystem for AI batch processing. + +The launch of Kthena marks a major milestone, extending Volcano's capabilities into the critical realm of Large Model inference. It crystallizes years of Volcano’s best practices in scheduling, elasticity, and multi-architecture support into a powerful engine for unified orchestration and intelligent routing. + +By leveraging the existing Kubernetes and Volcano ecosystems, teams can achieve smarter scheduling decisions and higher compute efficiency at a lower cost. For DaoCloud, Kthena not only solves tangible inference challenges but also embodies the future of Cloud Native AI—an open, intelligent ecosystem worthy of our long-term investment and deep engagement." + +
—— Paco Xu, Open Source Team Lead at DaoCloud, Member of Kubernetes Steering Committee
+ +"Deploying and managing self-hosted LLM inference services at production scale is a complex systems engineering challenge. It encompasses the entire lifecycle—deployment, operations, elasticity, and recovery—alongside critical requirements like GPU stability, scheduling efficiency, and AI observability. Kthena is engineered specifically to address these complexities. + +During Kthena’s planning phase, the Xiaohongshu Cloud Native team engaged deeply with contributors to co-design various intelligent traffic scheduling strategies. Moving forward, we will continue our collaboration on the AI Gateway front. By leveraging Xiaohongshu’s production insights, we aim to provide the community with production-ready capabilities, including granular traffic scheduling, model API management, and MCP protocol support." + +
—— Kong Gu (Huachang Chen), Cloud Native Business Gateway Lead, Xiaohongshu
+ +"After an in-depth evaluation of Kthena, China Unicom Cloud is impressed by its forward-looking design. We are particularly excited about its joint scheduling capabilities with Volcano. + +Features like topology awareness and Gang Scheduling directly address the critical efficiency and reliability challenges inherent in large-scale distributed inference, offering a promising solution to complex scheduling bottlenecks. + +We believe Kthena’s superior low latency, high throughput, and intelligent routing will provide the open-source community with a truly production-ready solution, empowering developers to build and manage cloud-native AI applications with greater efficiency." + +
—— Zhaoxu Lu, Team Lead, Intelligent Computing Center, China Unicom Cloud
+ +"Openness and collaboration fuel innovation. Within the CNCF ecosystem, we are dedicated to driving infrastructure towards an 'AI Native' future. + +By launching the Kthena sub-project, the Volcano community applies its proven expertise in batch computing—like topology awareness and Gang scheduling—to online LLM inference. Kthena introduces essential cloud-native scheduling primitives, enabling complex LLM workloads to run efficiently as first-class citizens in Kubernetes. + +We invite developers worldwide to join us in refining this critical infrastructure and accelerating the AI Native era." + +
—— Kevin Wang, Volcano Maintainer, CNCF TOC Vice Chair
+ +## Start Exploring Kthena Today + +This is just the beginning. We plan to support more efficient scheduling algorithms and broader best practices for large model deployment. + +* **GitHub Repository:** [https://github.com/volcano-sh/kthena](https://github.com/volcano-sh/kthena) +* **Official Website:** [https://kthena.volcano.sh/](https://kthena.volcano.sh/) +* **Community:** [Join our Slack](https://cloud-native.slack.com/archives/C011GJDQS0N) + +**Join us to unlock the full potential of Cloud Native LLMs!**# **Introducing Kthena: Redefining LLM Inference for the Cloud-Native Era** \ No newline at end of file diff --git a/website-docusaurus/blog/2020-01-01-paddlepaddle-en.md b/website-docusaurus/blog/2020-01-01-paddlepaddle-en.md new file mode 100644 index 00000000..b8572131 --- /dev/null +++ b/website-docusaurus/blog/2020-01-01-paddlepaddle-en.md @@ -0,0 +1,548 @@ ++++ +title = "PaddlePaddle Distributed Training on Volcano" +description = "Best practice about PaddlePaddle distributed training on Volcano" +subtitle = "" + +date = 2019-11-06 +lastmod = 2021-08-23 +datemonth = "Dec" +dateyear = "2020" +dateday = 23 + +draft = false # Is this a draft? true/false +toc = true # Show table of contents? true/false +type = "posts" # Do not modify. +authors = ["PaddlePaddle Team", "Volcano Team"] + +tags = ["Practice"] +summary = "Best practice about PaddlePaddle distributed training on Volcano" + +# Add menu entry to sidebar. +linktitle = "PaddlePaddle Distributed Training on Volcano" +[menu.posts] +parent = "tutorials" +weight = 4 ++++ + +>This article was firstly released at `Container Cube` on November 6th, 2019, refer to [百度飞桨(PaddlePaddle)分布式训练在Volcano系统上的实践](https://mp.weixin.qq.com/s/SnUUEEy9OfNghzoel7FtUg) + +PaddlePaddle is a deep learning framework open-sourced by Baidu in September 2016. It aims to provide a secure, easy-to-use, and scalable deep learning platform. + + + + +In October 2018, the PaddlePaddle team released Paddle Fluid 1.0, which enhanced neural network description, large-scale distributed training, and high-performance inference engine. Take the distributed training, indispensable to industrial applications, as an example. In Paddle Fluid 1.5.2, PaddlePaddle supports data parallelism, model parallelism, and pipeline parallelism. The parameter server architecture and point-to-point synchronous training architecture enable large-scale training on hardware resources such as CPU and GPU. The following paragraphs show you how to perform distributed training of PaddlePaddle on Volcano in the Kubernetes community. + + +Kubernetes is the most popular open-source system for automatic deployment, scaling, and resource management of containerized applications. With the development of Kubernetes, more and more companies are willing to deploy their service applications on Kubernetes. In addition to web and database services, deep learning frameworks are also deployed on Kubernetes for distributed training. + + + +However, creating deep learning training jobs on Kubernetes is not as intuitive as on a traditional high-performance computing MPI platform. In 2017, an article titled Run Deep Learning with PaddlePaddle on Kubernetes was published in the Kubernetes community. The article proposed that running PaddlePaddle on Kubernetes is a best practice, based on its computing fault tolerance, elastic scaling, and resource isolation. + +Since the release of Paddle Fluid 1.0, PaddlePaddle has advanced greatly in terms of platform deployment and job scheduling. With Kubernetes, PaddlePaddle can properly schedule CPU and GPU resources and elastically scale training jobs, significantly improving the utilization of computing resources. In spite of that, there is still room for improvement in parallel job creation and scheduling, lifecycle management of training jobs, affinity scheduling of computing resources, and scheduling policy optimization. To improve the computing efficiency of PaddlePaddle, the PaddlePaddle team joined with the Volcano team to release the "PaddlePaddle on Volcano" solution. + + +__Volcano is an enhanced batch scheduling system for high-performance computing workloads running on Kubernetes.__ + + + +Volcano complements Kubernetes in machine learning, deep learning, HPC, and big data computing scenarios, providing capabilities such as gang scheduling, computing job queue management, and GPU affinity scheduling. In addition, Volcano enhances batch job creation and lifecycle management, fair-share, and other Kubernetes-native capabilities. + +__Volcano meets the basic requirements of PaddlePaddle for resource creation and scheduling.__ Specifically, it provides automatic lifecycle management of computing jobs for PaddlePaddle. The gang scheduling policy meets the "all or nothing" scheduling requirements of pservers and trainers. The queue and priority logic controls the execution sequence of computing jobs in a cluster. Fair-share and GPU affinity scheduling align job scheduling with the requirements of pservers and trainers for node resources and network topology, improving computing efficiency. + +With the custom resource definition (CRD) creation capability of Kubernetes, Volcano provides a resource object whose apiVersion is batch.volcano.sh/v1alpha1 and kind is job to define computing tasks. You can create, manage, and schedule computing jobs on the Volcano platform. To use the Volcano platform, install Volcano in Kubernetes clusters by following the instructions provided on the Volcano official website. + + + +In this example, we will use Kubernetes-native resources and Volcano jobs to execute PaddlePaddle computing jobs, and compare Kubernetes with Volcano in execution methods, job management, and job scheduling. We choose the click-through rate (CTR) demo for distributed training on the PaddlePaddle official website. In the CTR demo, we will run two pserver tasks and two trainer tasks. + +According to the recommended method on the PaddlePaddle website, first create a Kubernetes ReplicaSet with two replicas to run pserver tasks, and then create a Kubernetes job with the parallelism being 2 to run trainer tasks. + +Create a pserver task. + +``` +root@volcano-paddlepaddle:~# kubectl apply -f pserver.yaml + +replicaset.extensions/fluid-ctr-pserver create +``` + +View the pserver ReplicaSet component. +``` +root@volcano-paddlepaddle:~# kubectl get rs + +NAME DESIRED CURRENT READY AGE +fluid-ctr-pserver 2 2 2 5 +``` + +View the pserver pods. +``` +root@volcano-paddlepaddle:~# kubectl get pods | grep fluid + +fluid-ctr-pserver-b9w99 1/1 Running 0 9m18s +fluid-ctr-pserver-pb9vd 1/1 Running 0 9m18 +``` + +View the pserver logs. pserver has started to provide services. +``` +root@volcano-paddlepaddle:~# kubectl logs fluid-ctr-pserver-b9w99 + + ++ case "$1"in ++ start_fluid_process ++ pserver_label=paddle-job-pserver=fluid-ctr ++ trainer_label=paddle-job=fluid-ct ++ hostname=c-rlnrdybm-muamumvq-1 ++ task_index= ++ '[' PSERVER == TRAINER '] ++ '[' PSERVER == PSERVER ']' ++ stdbuf -oL python /root/k8s_tools.py wait_pods_running paddle-job-pserver=fluid-ctr 2 +label selector: paddle-job-pserver=fluid-ctr, desired: 2 +current cnt: 0 sleep for 5 seconds... ++ '[' PSERVER == TRAINER ']' ++ '[' PSERVER == WORKER '] +++ python /root/k8s_tools.py fetch_endpoints paddle-job-pserver=fluid-ctr 30236 ++ export PADDLE_PSERVERS=192.168.48.24:30236,192.168.48.25:30237 ++ PADDLE_PSERVERS=192.168.48.24:30236,192.168.48.25:30237 +++ python /root/k8s_tools.py fetch_ips paddle-job=fluid-ctr ++ export PADDLE_TRAINER_IPS= ++ PADDLE_TRAINER_IPS= ++ '[' PSERVER == TRAINER ']' ++ '[' PSERVER == WORKER ']' +++ python /root/k8s_tools.py fetch_id paddle-job-pserver=fluid-ctr ++ task_index=0 ++ export PADDLE_TRAINER_ID=0 ++ PADDLE_TRAINER_ID=0 ++ export PADDLE_PSERVER_ID=0 ++ PADDLE_PSERVER_ID=0 ++ stdbuf -oL sh -c 'cd /workspace/ctr && python train.py --is_local 0 --cloud_train 1' +2019-09-03 06:43:10,661 - INFO - run dist training +2019-09-03 06:43:10,715 - INFO - run pserver +get_pserver_program() is deprecated, call get_pserver_programs() to get pserver main and startup in a single call. +I0903 06:43:10.826609 41 grpc_server.cc:435] Server listening on 192.168.48.24:30236 selected port: +``` + +Create a trainer task. +``` +root@volcano-paddlepaddle:~# kubectl apply -f trainer.yaml + +job.batch/fluid-ctr-trainer create +``` + +View the trainer pods. +``` +root@volcano-paddlepaddle:~# kubectl get pod | grep fluid + + +fluid-ctr-pserver-b9w99 1/1 Running 0 87m +fluid-ctr-pserver-pb9vd 1/1 Running 0 87m +fluid-ctr-trainer-lg9n5 1/1 Running 0 12s +fluid-ctr-trainer-tvr99 1/1 Running 0 12 +``` +View the trainer logs. The trainer task is being executed. +``` +root@volcano-paddlepaddle:~# kubectl logs fluid-ctr-trainer-lg9n5 + ++ case "$1" in ++ start_fluid_process ++ pserver_labe=paddle-job-pserver=fluid-ctr ++ trainer_label=paddle-job=fluid-ctr ++ hostname=c-rlnrdybm-muamumvq-2 ++ task_index= ++ '[' TRAINER == TRAINER ']' ++ stdbuf -oL python /root/k8s_tools.py wait_pods_running paddle-job-pserver=fluid-ctr 2 +label selector: paddle-job-pserver=fluid-ctr, desired: 2 ++ '[' TRAINER == TRAINER ']' ++ stdbuf -oL python /root/k8s_tools.py wait_pods_running paddle-job=fluid-ctr 2 +label selector: paddle-job=fluid-ctr, desired: 2 +++ python /root/k8s_tools.py fetch_endpoints paddle-job-pserver=fluid-ctr 30236 ++ export PADDLE_PSERVERS=192.168.48.24:30236,192.168.48.25:30237 ++ PADDLE_PSERVERS=192.168.48.24:30236,192.168.48.25:30237 +++ python /root/k8s_tools.py fetch_ips paddle-job=fluid-ctr ++ export PADDLE_TRAINER_IPS=192.168.48.24,192.168.48.25 ++ PADDLE_TRAINER_IPS=192.168.48.24,192.168.48.25 ++ '[' TRAINER == TRAINER ']' ++ check_failed_cnt 1 ++ max_failed=1 +++ python /root/k8s_tools.py count_pods_by_phase paddle-job=fluid-ctr Failed ++ failed_count=0 ++ '[' 0-gt 1 ']' +++ python /root/k8s_tools.py fetch_id paddle-job=fluid-ctr ++ task_index=0 ++ export PADDLE_TRAINER_ID=0 ++ PADDLE_TRAINER_ID=0 ++ export PADDLE_PSERVER_ID=0 ++ PADDLE_PSERVER_ID=0 ++ stdbuf -oL sh -c 'cd /workspace/ctr && python train.py --is_local 0 --cloud_train 1' +2019-09-03 08:10:20,888 - INFO - run dist training +2019-09-03 08:10:20,951 - INFO - download the training materials + % Total % Received % Xferd Average Speed Time Time Time Current + Dload Upload Total Spent Left Speed +100 433M 100 433M 0 0 70.9M 0 0:00:06 0:00:06 --:--:-- 97.0M +2019-09-03 08:11:04,522 - INFO - run trainer +2019-09-03 08:11:04,591 - WARNING - +I0903 08:11:04.594007 25 parallel_executor.cc:329] The number of CPUPlace, which is used in ParallelExecutor, is 2. And the Program will be copied 2 copies +I0903 08:11:04.875757 25 rpc_client.h:101] init rpc client with trainer_id 0 +2019-09-03 08:11:38,625 - INFO - TRAIN --> pass: 0 batch: 0 loss: 0.697331115723 auc: 0.500826068453, batch_auc: 0.500826068453 +2019-09-03 08:11:38,967 - INFO - TRAIN --> pass: 0 batch: 1 loss: 0.652093688965 auc: 0.48451329672, batch_auc: 0.48451329672 +2019-09-03 08:11:39,242 - INFO - TRAIN --> pass: 0 batch: 2 loss: 0.629092956543 auc: 0.485173881519, batch_auc: 0.485173881519 +2019-09-03 08:11:39,577 - INFO - TRAIN --> pass: 0 batch: 3 loss: 0.603850708008 auc: 0.482131778494, batch_auc: 0.482131778494 +2019-09-03 08:11:39,874 - INFO - TRAIN --> pass: 0 batch: 4 loss: 0.591485412598 auc: 0.479737304993, batch_auc: 0.479737304993 +2019-09-03 08:11:40,133 - INFO - TRAIN --> pass: 0 batch: 5 loss: 0.58376159668 auc: 0.478554220739, batch_auc: 0.478554220739 +2019-09-03 08:11:40,385 - INFO - TRAIN --> pass: 0 batch: 6 loss: 0.561969116211 auc: 0.481465857424, batch_auc: 0.481465857424 +2019-09-03 08:11:40,637 - INFO - TRAIN --> pass: 0 batch: 7 loss: 0.557065185547 auc: 0.486014931119, batch_auc: 0.486014931119 +2019-09-03 08:11:40,890 - INFO - TRAIN --> pass: 0 batch: 8 loss: 0.562498413086 auc: 0.489651573333, batch_auc: 0.489651573333 +2019-09-03 08:11:41,158 - INFO - TRAIN --> pass: 0 batch: 9 loss: 0.566428283691 auc: 0.489853260221, batch_auc: 0.49137884426 +2019-09-03 08:11:41,452 - INFO - TRAIN --> pass: 0 batch: 10 loss: 0.564840087891 auc: 0.492880386228, batch_auc: 0.494013763938 +2019-09-03 08:11:41,742 - INFO - TRAIN --> pass: 0 batch: 11 loss: 0.564809204102 auc: 0.493201528907, batch_auc: 0.498872381582 +2019-09-03 08:11:42,056 - INFO - TRAIN --> pass: 0 batch: 12 loss: 0.584479736328 auc: 0.494151972036, batch_auc: 0.503926628391 +2019-09-03 08:11:42,329 - INFO - TRAIN --> pass: 0 batch: 13 loss: 0.615677246094 auc: 0.49252557362, batch_auc: 0.5028352489 +``` + +After the trainer task is completed, check the status of the pserver and trainer pods. The pserver pods are still running. +``` +root@volcano-paddlepaddle:~# kubectl get pods | grep fluid + + +fluid-ctr-pserver-b9w99 1/1 Running 0 177m +fluid-ctr-pserver-pb9vd 1/1 Running 0 177m +fluid-ctr-trainer-lg9n5 0/1 Completed 0 90m +fluid-ctr-trainer-tvr99 0/1 Completed 0 90 +``` + +Run the preceding computing tasks again on the Volcano platform. + +Volcano supports multi-pod jobs and allows you to define multiple pods in the tasks field. For example, replicas indicates the number of pods to be generated by a task, and name indicates the task name. The pod name is generated based on the task name. The value of Template is the same as that of podTemplate in Kubernetes. The CTR demo contains two types of tasks: pserver and trainer. Each task has two replicas, which means that the demo will create two pserver tasks and two trainer tasks. + +If you use the Volcano scheduler, set schedulerName to volcano in the job configuration. If schedulerName is not set to volcano, the default scheduler of Kubernetes is used. + +Volcano uses the minAvailable field to configure the gang scheduling policy. minAvailable indicates the minimum number of pods required for executing a task. The value of minAvailable cannot exceed the total number of pods in the task. In the PaddlePaddle framework, computing starts only when all pserver and trainer tasks are running. Therefore, in PaddlePaddle, the value of minAvailable must be equal to the total number of computing pods in the task. + +For an application that uses PaddlePaddle for distributed training, if a pserver task or a trainer task is evicted or fails, the computing cluster formed by the pserver and trainer tasks becomes invalid. All pserver tasks and trainer tasks will be restarted to form a new computing cluster. In Volcano, this can be achieved by configuring the policies field. Set the PodEvicted event to align with RestartJob and set the PodFailed event to align with RestartJob. After you set the two events, if a computing task is evicted or fails, all computing tasks will be restarted. + +The following is the configuration file ctr-volcano.yaml used for executing CTR tasks on the Volcano platform. You can obtain the configuration file from the Volcano code repository. + +Volcano code repository: +https://github.com/volcano-sh/volcano/blob/master/example/integrations/paddlepaddle/ctr-paddlepaddle-on-volcano.yaml + +``` +apiVersion: batch.volcano.sh/v1alpha1 +kind: Job +metadata: + name: ctr-volcano +spec: + minAvailable: 4 +schedulerName: volcano + policies: +- event: PodEvicted + action: RestartJob + - event: PodFailed + action: RestartJob + tasks: +- replicas: 2 + name: pserver + template: + metadata: + labels: +paddle-job-pserver: fluid-ctr + spec: + imagePullSecrets: + - name: default-secret + volumes: + - hostPath: + path: /home/work/ + type: "" + name: seqdata + containers: + - image: volcanosh/edlctr:v1 + command: + - paddle_k8s + - start_fluid + imagePullPolicy: IfNotPresent + name: pserver + volumeMounts: + - mountPath: /mnt/seqdata + name: seqdata + resources: + limits: + cpu: 10 + memory: 30Gi + ephemeral-storage: 10Gi + requests: + cpu: 1 + memory: 100M + ephemeral-storage: 1Gi + env: + - name: GLOG_v + value: "0" + - name: GLOG_logtostderr + value: "1" + - name: TOPOLOGY + value: "" + - name: TRAINER_PACKAGE + value: /workspace + - name: NAMESPACE + valueFrom: + fieldRef: + apiVersion: v1 + fieldPath: metadata.namespace + - name: POD_IP + valueFrom: + fieldRef: + apiVersion: v1 + fieldPath: status.podIP + - name: POD_NAME + valueFrom: + fieldRef: + apiVersion: v1 + fieldPath: metadata.name + - name: PADDLE_CURRENT_IP + valueFrom: + fieldRef: + apiVersion: v1 + fieldPath: status.podIP +- name: PADDLE_JOB_NAME + value: fluid-ctr + - name: PADDLE_IS_LOCAL + value: "0" + - name: PADDLE_TRAINERS_NUM + value: "2" + - name: PADDLE_PSERVERS_NUM + value: "2" + - name: FLAGS_rpc_deadline +" value: " + - name: ENTRY + value: cd /workspace/ctr && python train.py --is_local 0 --cloud_train 1 + - name: PADDLE_PORT + value: "30236" + - name: LD_LIBRARY_PATH + value: /usr/local/lib:/usr/local/nvidia/lib64:/usr/local/rdma/lib64:/usr/lib64/mlnx_ofed/valgrind +- name: PADDLE_TRAINING_ROLE + value: PSERVER +- name: TRAINING_ROLE + value: PSERVER + restartPolicy: OnFailure + - replicas: 2 + policies: + - event: TaskCompleted + action: CompleteJob + name: trainer + template: + metadata: + labels: + paddle-job: fluid-ctr + spec: + imagePullSecrets: +- name: default-secret + volumes: + - hostPath: + path: /home/work/ + type: "" + name: seqdata + containers: + - image: volcanosh/edlctr:v1 + command: + - paddle_k8s + - start_fluid + imagePullPolicy: IfNotPresent + name: trainer + volumeMounts: + - mountPath: /mnt/seqdata + name: seqdata + resources: + limits: + cpu: 10 + memory: 30Gi + ephemeral-storage: 10Gi + requests: + cpu: 1 + memory: 100M + ephemeral-storage: 10Gi + env: +- name: GLOG_v + value: "0" + - name: GLOG_logtostderr + value: "1" +- name: TOPOLOGY +- name: TRAINER_PACKAGE + value: /workspace + - name: CPU_NUM + value: "2" + - name: NAMESPACE + valueFrom: + fieldRef: + apiVersion: v1 + fieldPath: metadata.namespace + - name: POD_IP + valueFrom: + fieldRef: + apiVersion: v1 + fieldPath: status.podIP + - name: POD_NAME + valueFrom: + fieldRef: + apiVersion: v1 + fieldPath: metadata.name + - name: PADDLE_CURRENT_IP + valueFrom: + fieldRef: + apiVersion: v1 + fieldPath: status.podIP + - name: PADDLE_JOB_NAME + value: fluid-ctr + - name: PADDLE_IS_LOCAL + value: "0" + - name: FLAGS_rpc_deadline +" value: " + - name: PADDLE_PORT + value: "30236" +- name: PADDLE_PSERVERS_NUM + value: "2" + - name: PADDLE_TRAINERS_NUM + value: "2" +- name: PADDLE_TRAINING_ROLE + value: TRAINER + - name: TRAINING_ROLE + value: TRAINER + - name: LD_LIBRARY_PATH + value: /usr/local/lib:/usr/local/nvidia/lib64:/usr/local/rdma/lib64:/usr/lib64/mlnx_ofed/valgrind + - name: ENTRY + value: cd /workspace/ctr && python train.py --is_local 0 --cloud_train 1 + restartPolicy: OnFailure +``` + +Run the following command in the cluster to create a Volcano job in the default namespace: +``` +root@volcano-paddlepaddle:~# kubectl apply -f ctr-volcano.yaml + +job.batch.volcano.sh/ctr-volcano create +``` + +Check the pod status. Both pserver tasks and trainer tasks have been delivered to the cluster and should be running properly. If the idle resources in the cluster cannot meet the demands of the pserver and trainer tasks, no more tasks will be created. + +``` +root@volcano-paddlepaddle:~# kubectl get pods | grep ctr-volcano + + +ctr-volcano-pserver-0 1/1 Running 0 16s +ctr-volcano-pserver-1 1/1 Running 0 16s +ctr-volcano-trainer-0 1/1 Running 0 16s +ctr-volcano-trainer-1 1/1 Running 0 16 +``` + +Select a pserver pod to view logs. pserver is listening on the corresponding port and providing services. + +``` +root@volcano-paddlepaddle:~# kubectl logs ctr-volcano-pserver-0 + + ++ case "$1" in ++ start_fluid_process ++ pserver_label=paddle-job-pserver=fluid-ctr ++ trainer_label=paddle-job=fluid-ctr ++ hostname=ctr-volcano-pserver-0 ++ task_index= ++ '[' PSERVER == TRAINER ']' ++ '[' PSERVER == PSERVER ']' ++ stdbuf -oL python /root/k8s_tools.py wait_pods_running paddle-job-pserver=fluid-ctr 2 +label selector: paddle-job-pserver=fluid-ctr, desired: 2 +current cnt: 0 sleep for 5 seconds... ++ '[' PSERVER == TRAINER ']' ++ '[' PSERVER == WORKER ']' +++ python /root/k8s_tools.py fetch_endpoints paddle-job-pserver=fluid-ctr 30236 ++ export PADDLE_PSERVERS=172.20.0.148:30236,172.20.1.134:30237 ++ PADDLE_PSERVERS=172.20.0.148:30236,172.20.1.134:30237 +++ python /root/k8s_tools.py fetch_ips paddle-job=fluid-ctr ++ export PADDLE_TRAINER_IPS=172.20.0.147,172.20.1.133 ++ PADDLE_TRAINER_IPS=172.20.0.147,172.20.1.133 ++ '[' PSERVER == TRAINER ']' ++ '[' PSERVER == WORKER ']' +++ python /root/k8s_tools.py fetch_id paddle-job-pserver=fluid-ctr ++ task_index=0 ++ export PADDLE_TRAINER_ID=0 ++ PADDLE_TRAINER_ID=0 ++ export PADDLE_PSERVER_ID=0 ++ PADDLE_PSERVER_ID=0 ++ stdbuf -oL sh -c 'cd /workspace/ctr && python train.py --is_local 0 --cloud_train 1' +2019-09-03 09:57:55,619 - INFO - run dist training +2019-09-03 09:57:55,708 - INFO - run pserver +get_pserver_program() is deprecated, call get_pserver_programs() to get pserver main and startup in a single call. +I0903 09:57:55.860916 41 grpc_server.cc:435] Server listening on 172.20.0.148:30236 selected port: + +Select a trainer pod to view logs. The computing task is being executed. +root@volcano-paddlepaddle:~# kubectl logs ctr-volcano-trainer-0 + + ++ case "$1" in ++ start_fluid_process ++ pserver_label=paddle-job-pserver=fluid-ctr ++ trainer_label=paddle-job=fluid-ctr ++ hostname=ctr-volcano-trainer-0 ++ task_index= ++ '[' TRAINER == TRAINER ']' ++ stdbuf -oL python /root/k8s_tools.py wait_pods_running paddle-job-pserver=fluid-ctr 2 +label selector: paddle-job-pserver=fluid-ctr, desired: 2 +current cnt: 0 sleep for 5 seconds... ++ '[' TRAINER == TRAINER ']' ++ stdbuf -oL python /root/k8s_tools.py wait_pods_running paddle-job=fluid-ctr 2 +label selector: paddle-job=fluid-ctr, desired: 2 +++ python /root/k8s_tools.py fetch_endpoints paddle-job-pserver=fluid-ctr 30236 ++ export PADDLE_PSERVERS=172.20.0.148:30236,172.20.1.134:30237 ++ PADDLE_PSERVERS=172.20.0.148:30236,172.20.1.134:30237 +++ python /root/k8s_tools.py fetch_ips paddle-job=fluid-ctr ++ export PADDLE_TRAINER_IPS=172.20.0.147,172.20.1.133 ++ PADDLE_TRAINER_IPS=172.20.0.147,172.20.1.133 ++ '[' TRAINER == TRAINER ']' ++ check_failed_cnt 1 ++ max_failed=1 +++ python /root/k8s_tools.py count_pods_by_phase paddle-job=fluid-ctr Failed ++ failed_count=0 ++ '[' 0 -gt 1 ']' +++ python /root/k8s_tools.py fetch_id paddle-job=fluid-ctr ++ task_index=0 ++ export PADDLE_TRAINER_ID=0 ++ PADDLE_TRAINER_ID=0 ++ export PADDLE_PSERVER_ID=0 ++ PADDLE_PSERVER_ID=0 ++ stdbuf -oL sh -c 'cd /workspace/ctr && python train.py --is_local 0 --cloud_train 1' +2019-09-03 09:57:56,712 - INFO - run dist training +2019-09-03 09:57:56,773 - INFO - download the training materials + % Total % Received % Xferd Average Speed Time Time Time Current + Dload Upload Total Spent Left Speed +100 433M 100 433M 0 0 96.2M 0 0:00:04 0:00:04 --:--:-- 96.2M +2019-09-03 09:58:27,648 - INFO - run trainer +2019-09-03 09:58:27,732 - WARNING - +I0903 09:58:27.734141 25 parallel_executor.cc:329] The number of CPUPlace, which is used in ParallelExecutor, is 2. And the Program will be copied 2 copies +I0903 09:58:27.937546 25 rpc_client.h:101] init rpc client with trainer_id 0 +2019-09-03 09:58:37,957 - INFO - TRAIN --> pass: 0 batch: 0 loss: 0.670620727539 auc: 0.510430537062, batch_auc: 0.510764985415 +2019-09-03 09:58:38,264 - INFO - TRAIN --> pass: 0 batch: 1 loss: 0.641319274902 auc: 0.503955813399, batch_auc: 0.503955813399 +2019-09-03 09:58:38,585 - INFO - TRAIN --> pass: 0 batch: 2 loss: 0.617138793945 auc: 0.50334993182, batch_auc: 0.50334993182 +2019-09-03 09:58:38,873 - INFO - TRAIN --> pass: 0 batch: 3 loss: 0.598490356445 auc: 0.507263818365, batch_auc: 0.507263818365 +2019-09-03 09:58:39,182 - INFO - TRAIN --> pass: 0 batch: 4 loss: 0.573976501465 auc: 0.510442316749, batch_auc: 0.51044231674 +``` +View the pod logs after about 70 minutes. The logs show that the pods have terminated normally. + +``` +root@volcano-paddlepaddle:~# kubectl get pod | grep ctr-volcano + + +ctr-volcano-trainer-0 0/1 Completed 0 77m +ctr-volcano-trainer-1 0/1 Completed 0 77 +``` + +After the training is completed, we may need to use the trained model for other purposes. In the YAML file, we have defined the volcanosh/edlctr:v1 image in the /workspace/ctr directory, and have configured train.py to call the save_inference_model interface to save the model after every 1,000 jobs or after completing each round of training sets. The model is saved in the /workspace/ctr/models folder. After the training is completed, there are two ways you can obtain the model: +- In the YAML file, define volume in spec of trainer to map the /workspace/ctr/models folder to the host. Run the kubectl describe pod ctr-volcano-trainer-0 command to locate the node where the model is stored. Then, log in to this node using SSH to obtain the trained model in the path on the host. +- To obtain the model automatically, create a file server and distributed file system, such as GlusterFS, in the Kubernetes cluster, and map the /workspace/ctr/models folder in the ctr-volcano-trainer-0 container to the persistent volume claim (PVC) of GlusterFS. Then, use wget or curl to obtain and deliver the model over FTP. + + +To sum up, we can use Volcano to execute PaddlePaddle computing jobs, including batch creation, automatic management, and job scheduling. Compared with the ReplicaSet+job mode, Volcano can improve the efficiency of parallel computing. + +Authors + +>Dong Daxiang, @guru4elephant, PaddlePaddle Architect, Principal Architect, Baidu +Wang Jiawei, @wangjiawei04, PaddlePaddle Engineer, Senior Engineer, Baidu +Yu Dianhai, @raindrops2sea, PaddlePaddle Architect, Distinguished Architect, Baidu +Zhang Jinghui, @sivanzcw, Volcano Contributor, Cloud Native software engineer, Huawei +Ma Da, @k82cn, Kubernetes Maintainer, SIG-Scheduling Co-Leader, Volcano Lead, Huawei + + diff --git a/website-docusaurus/blog/2020-09-30-aiqiyi-en.md b/website-docusaurus/blog/2020-09-30-aiqiyi-en.md new file mode 100644 index 00000000..68b9db01 --- /dev/null +++ b/website-docusaurus/blog/2020-09-30-aiqiyi-en.md @@ -0,0 +1,224 @@ +--- +title: "iQIYI:Volcano-based Cloud Native Migration Practices" +description: "Volcano use case in deep learning and service migration" +date: 2020-09-30 +authors: [volcano] +--- +>This article was firstly released at `Container Cube` on September 30th, 2020, refer to[揭秘爱奇艺深度学习平台云原生迁移实践](https://mp.weixin.qq.com/s/YtP-ZURRBr5-ba1eWfKS2A) + +## Introduction to iQIYI Jarvis Deep Learning Platform + +__Overall Architecture of the Platform__ + +The platform supports GPU- and CPU-based training and inference. S3, HDFS, and NFS can be used for storing training data and models. The platform supports TensorFlow, PyTorch, Caffe, Caffe2 and MXNet. It uses TensorFlow and PyTorch. TensorFlow 1.X to 2.X are supported. + +The platform can be used in advertising, search, recommendation, NLP, and other services. iQIYI uses Mesos + Marathon as their elastic container platform. When iQIYI started the platform, Kubernetes was not mature enough to be considered. Therefore, our containers do not run on K8s. + + + + + +{{
}} + +__One-stop Platform Service__ + +Four small platforms are used to provide the service. The first is the data preprocessing platform. It analyzes the training data in a visualized manner, helps users adjust parameters, and detects abnormal data in a timely manner. + +{{
}} + +The second is the training code compilation platform. You can use the RunOnce or notebook training to obtain an environment that is the same as the training environment. You then can compile the training code, and commit the code to GitLab. + + + +The third is the training job execution platform. You can run a training job by using the Jarvis training platform, and then the training code will be executed. An algorithm model will be output. + +The last is the Jarvis inference platform. You can create an inference service using the platform and provide the inference service for external systems. + +__Platform Development__ + + +iQIYI started from the inference platform. iQIYI first enables the models to provide services for external systems, and then gradually extends the platform functions to support training, development, and data preprocessing. Currently, iQIYI is migrating the elastic container platform from Mesos + Marathon to K8s + Volcano. + + + +__Training Platform Architecture Before Volcano Is Used__ + +The following figure shows the training platform architecture before Volcano is used. + +{{
}} + +The process is as follows: + +A.Compile training code and commit it to GitLab. + +B.You can create a training job on the web page or using the command line tool. To create a training job, you need to enter the following information: + +- Required resources + +- Images. Each version of each framework is supported by an image. Selecting an image means selecting an algorithm framework. + +- There may be multiple clusters. You need to specify the cluster where the job is expected to run. + +- The URL of the GitLab project. The project contains the training code you compiled. + +C.The Jarvis cli/web converts the request into gRPC and sends it to the Jarvis core. + +D.The core converts the request and calls the Marathon API to create a container. + +E.The container is started in the specified cluster and executes the training job. + +__Challenges of Migrating the Training Platform to Kubernetes__ + +The challenges are as follows: + +- Native pods, Deployments, and jobs cannot meet the requirements of distributed training. + +- Queue and quota management are not supported. + +- Lack of scheduling capabilities, such as Gang Scheduling. + +__Introducing Volcano__ + +The three most important concepts of Volcano are VolcanoJob, queue, and PodGroup. VolcanoJob, referred to as vcjob, is an extension of Kubernetes jobs or an encapsulation of pods. + +Queues can be used to manage quotas. + +PodGroup is a group of pods and can be used for advanced upper-layer scheduling. + +{{
}} + +So far: + +- Volcano is the native batch system of Kubernetes and is highly suitable for AI training. + +- It does not intrude Kubernetes source code and complies with the Kubernetes development specifications, facilitating secondary development. + +- It has been accepted by Cloud Native Computing Foundation (CNCF) and is mature. + + +## Power of Volcano + +__How Does Volcano Solve Problems of Migrating to Kubernetes?__ + +__Gang Scheduling__ + +1.1 Gang Scheduling + +Gang scheduled pods run simultaneously or none of them run. This is important for AI training, especially distributed training in most scenarios. The feature of distributed training is that a large number of pods, for example, 40 or 50 pods, are started at a time. If some pods of a task are scheduled and some pods are not scheduled, the task cannot run properly. This will also cause resource waste, or even deadlocks. + +{{
}} + +For example, there are only four GPUs in a resource pool and tasks A and B. Each task has four pods, and each pod requires one GPU. When tasks A and B are created at the same time, without gang scheduling, each task may obtain only two GPUs. In this case, neither of the tasks can be completed, resulting in a deadlock. Unless resources are added to the pool, the deadlock cannot be resolved. + + + +Volcano schedules jobs in the unit of PodGroup to implement gang scheduling, avoiding the preceding problem. + +1.2 Native Support for Distributed Tasks + +Take TensorFlow distributed training as an example. It has the following roles: Parameter Server (PS), master, and worker. PS is used to store parameters. Master and worker are used to calculate gradients. In each iteration, master and worker obtain parameters from PS and update the calculated gradients to PS. PS aggregates the gradients returned from master and worker, updates the parameters, and broadcasts the updated parameters to master and worker. + +{{
}} + +Let's focus on one of its network structures. If master or worker needs to communicate with PS, problems will occur. When creating a pod, a user may not know the IP address of the pod. Multiple pods created in a deployment may not know the IP address or domain name of each other. Without Volcano, solutions to these problems are complicated. + +Each role must know the IP address or domain name of the other roles, the role it plays, and the number of indexes. A TF_CONFIG configuration file is required to include the IP addresses or domain names of master, worker, and PS. These are difficult to implement in Kubernetes. However, with Volcano, the solutions become simple. + +{{
}} + +Volcano can easily build TF_CONFIG through file injection to support TensorFlow distributed training. Volcano injects a folder (etc/volcano) to multiple pods under a vcjob. The folder includes all domain names of master, volcano, and PS. In this way, each pod knows the peers in the entire cluster, and the TensorFlow distributed training can be performed. + +Currently, TensorFlow provides some high-level APIs, such as TF estimator. The single-node code and distributed code in the estimator are the same, but the configuration of TF_CONFIG is different. If the environment variables or configuration files in such a format are passed, distributed training can be performed. If platforms can build the TF_CONFIG file, users can directly run the file. + + +1.3 Horovod/MPI + +Volcano supports Horovod, which is similar to TensorFlow. They are both used for distributed training but differ in the way of updating parameters. + +{{
}} + +Horovod uses the ring allreduce method to update parameters, and what does that mean for us when we want to build a basic environment for upper-layer applications to use? What does the ring allreduce architecture require? + +First, we need to ensure that each node knows the domain name of each other, as mentioned earlier. Second, we need to enable a node to SSH log in to another through port 22 without a password. This passwordless SSH can be automatically implemented with Volcano's SSH plugin, saving a lot of trouble. + +1.4 Quotas and Queues + +Volcano uses queues (CRD objects) to schedule jobs. Let's assume that we have two queues, as shown in the following figure. Queue1 has a quota of 20 GPUs and queue2 has a quota of 10 GPUs. The resources of queue1 are abundant so new jobs in queue 1 can be scheduled. However, all resources in queue2 have been used, and new jobs in queue2 cannot be scheduled and have to wait in queue. As a result, the PodGroups changes to the pending state. + +The teams in our platform are similar to the Volcano queues. How? Each team has a quota, and quotas are independent between teams. When the resource usage reaches the quota of a team, the jobs in the team have to wait in queue. When resources are available, the queued jobs will be executed based on the priority, which means the jobs with a higher priority will be run first. Considering this similarity, the interconnection between Volcano and iQIYI’s platform can be fairly easy. + +{{
}} + + + +1.5 Integration with Volcano + +iQIYI has added the volcano_plugin, which encapsulates the RESTful APIs of vcjob, queue, and PodGroup. It converts the gRPC requests into YAML configurations that comply with the Kubernetes API specifications, and calls the Kubernetes API to create containers. + +{{
}} + +Jarvis Core determines which backend to use based on the passed cluster information. + +## Encountered Issues + +Issue 1 + +Symptom:During Volcano upgrade, the image in https://github.com/volcano_x0002_sh/volcano/blob/master/installer/volcano-development.yaml was directly modified, and kubectl apply -f was executed. The existing queues and vcjobs all disappeared. + +Cause:volcano-admission-init in the YAML file was executed repeatedly. As a result, Volcano was reset. + +Solution: Upgrade only the necessary components. + +{{
}} + +Issue 2 + +Symptom: When list_and_watch was used to monitor vcjob status, the watch connection broke every 80 to 90 seconds when there were no new events, and the disconnection duration varied. Such issue did not occur when the same code was used to monitor pods. + +Cause: The default http timeout for CRD objects in Kubernetes is time.Duration(float64(minRequestTimeout) * (rand.Float64() + 1.0)), where miniRequestTimeout is set to 1 minute. You can specify timeoutSecond on the client to avoid this issue. + +{{
}} + +Issue 3 + +Symptom: The container entry address in Jarvis is a bash script. When the script was run in Kubernetes, a container did not exit until 30 seconds after the stop command was delivered. + +Cause:Bash did not pass the signal to child processes. When graceful stop timeout was reached, the daemon process detected that the container had not exited and sent a SIGKILL signal to kill the bash script and exit the container. However, other processes in the container had no chance to clean up. + +Solution:Use dum-init to run a script such as the following entry script: + +``` +#!/usr/bin/dumb-init /bin/bash + +my-web-server & # launch a process in the background + +my-other-server # launch another process in the foreground + +``` + +1.6Modifications on Volcano + +- The SVC plugin now supports the input parameter nodeport. It means when we create a vcjob and pass the SVC parameter, a nodeport will be created, so our TensorBoard and other services can be accessed externally. + +- We have fixed the bug that creation fails when the name of the SSH plugin exceeds 63 bytes. + +- Volcano has fixed the bug in the queue capability that resources can be used over the capability. For details, see https://github.com/volcano-sh/volcano/issues/921. + +- After a vcjob is annotated, if a pod fails, the vcjob deletion is not triggered. For details, see https://github.com/volcano_x0002_sh/volcano/issues/805. + + +## Summary + +Volcano makes up for the lack of basic deep learning capabilities in Kubernetes. + +- Gang Scheduler + +- Queue management + +Volcano code complies with the Kubernetes standards and is non-intrusive. + +- Lower development and interconnection costs + +- Easy for secondary development + +Volcano-based Jarvis has been released and is running properly. \ No newline at end of file diff --git a/website-docusaurus/blog/2020-10-27-hpc-en.md b/website-docusaurus/blog/2020-10-27-hpc-en.md new file mode 100644 index 00000000..581de5d7 --- /dev/null +++ b/website-docusaurus/blog/2020-10-27-hpc-en.md @@ -0,0 +1,254 @@ +--- +title: "HPC on Volcano: How Containers Support HPC Applications in the Meteorological Industry" +description: "This article uses a traditional HPC application, the Weather Research and Forecasting (WRF) model, as an example to describe how Volcano works for HPC applications." +date: 2020-10-27 +authors: [volcano] +--- +>This article was firstly released at `Container Cube` on October 27th, 2020, refer to [HPC on Volcano:容器在气象行业HPC高性能计算场景的应用](https://mp.weixin.qq.com/s/wLIoJeUSey9tzOCV6GZRig) + + + +Kubernetes has become the de facto standard for cloud native application orchestration and management. An increasing number of applications are being reconstructed or built to employ Kubernetes. High performance computing (HPC) is a popular distributed computing mode and is widely used in many fields. For users who have deployed HPC applications and are eager to containerize and manage their applications using Kubernetes, Volcano, CNCF's first distributed scheduling system for batch computing, is a good choice. Volcano supports multiple types of computing frameworks, such as Spark, TensorFlow, and Message Passing Interface (MPI). This article uses a traditional HPC application, the Weather Research and Forecasting (WRF) model, as an example to describe how Volcano works for HPC applications. + + +## About HPC + +HPC and HPCC are two common terms in the area of computing jobs. HPCC is short for high performance computer cluster, which integrates a large amount of computer software and hardware to conduct parallel computing on large computing jobs. HPC is widely used in CAE simulation, animation rendering, physics, chemistry, oil exploration, and life, meteorological, and environmental science. + +An HPCC consists of three parts: + +{{
}} + +- Portable Batch System (PBS): A resource manager that manages all node resources in a cluster. Other common resource management systems include Slurm and Platform Load Sharing Facility (or simply LSF). + +- Maui: A third-party job scheduler that supports multiple priority-based scheduling policies, resource reservations, and preemption mechanisms. Maui provides more advanced scheduling services than the default schedulers embedded in most resource managers. + +- Open MPI: An upper-layer communication environment that provides a communication library and compilation functions and starts distributed tasks. + + +PBS and Maui are imperceptible to users. Users only need to submit jobs in the mode defined by PBS and do not need to understand internal implementation details. However, users are required to learn how to use Open MPI to compile parallel computing applications. + + + +The following uses __mpirun -np 4 ./mpi_hello_world__ as an example to illustrate how an MPI job runs. + +{{
}} + +- Invoke Open MPI or other MPI libraries to compile the source code. In this example, Hello World! is printed. + +- Use a compiler that supports MPI to compile the executable program mpi_hello_world. + +- Distribute mpi_hello_world to each node. You can also make mpi_hello_world accessible by sharing the file system. + +- Run mpirun to execute mpi_hello_world in parallel. + + +## About WRF + +The Weather Research and Forecasting (WRF) model is a common HPC application. WRF is a mesoscale numerical weather prediction (NWP) system designed for both atmospheric research and forecasting. It allows researchers to produce simulations based on real or hypothetical atmospheric conditions. + +WRF consists of multiple modules with different processing flows. The following illustrates a WRF process. + +{{
}} + +As shown in the figure above, this WRF process has four parts: +- External data sources + +- WRF Pre-Processing System (WPS) + +- WRF, which is the core simulation system + +- Post-processing system + +__External Data Sources__ + +The WRF model data includes static geographical data and gridded data. Geographical data refers to geographical information in a domain, such as mountains, rivers, lakes, and forests. Gridded data refers to the meteorological environment data in a domain, such as temperature, wind speed, wind direction, air humidity, and rainfall. + +__WPS__ + +——WPS,WRF Pre-processing System) + + + +WPS loads geographical and meteorological data, interpolates meteorological data to grids, and finally provides data input for the WRF. It consists of three main programs: +- geogrid.exe: defines model projections, domain range, and nesting relationships, interpolates terrestrial parameters, and processes terrain and gridded data. + +- ungrib.exe: extracts required meteorological parameters from the GRIB data. + +- metgrid.exe: interpolates meteorological parameters to simulation domains. + +The three programs work together to generate data used for meteorological simulation. Currently, the three programs do not support MPI parallel computing. + + + + +__WRF__ + +As the core module of the WRF model, WRF performs simulation and prediction based on the meteorological information generated by WPS. WRF consists of two main programs: + +- real.exe: initializes the actual meteorological data. + +- wrf.exe: simulates and predicts results. + +real.exe and wrf.exe can run as MPI parallel jobs to improve the computing speed. + +{{
}} + +As shown in the preceding figure, wrfinput_d0X and wrfbdy_d0X are the calculation results generated by real.exe. wrf.exe performs meteorological simulation based on these results to generate the final result wrfout_dxx_yyyy-mm-dd_hh:mm:ss, which is verified and displayed by the post-processing system. + +__Post-Processing System__ + +The post-processing system verifies and displays the calculation results generated by WRF. It consists of various third-party images and verification tools. The following figure shows the simulation and prediction results of the relative humidity in each area in CONUS 2.5km case. + +{{
}} + +CONUS 2.5km refers to the 2.5 km resolution case covering the Continental U.S. (CONUS) domain. (In this case, the entire domain is divided into multiple cubes of 2.5 km x 2.5 km x 2.5 km. The meteorological information in each cube is considered consistent.) + +## HPC on Volcano + +{{
}} + +As mentioned above, an HPCC consists of a resource manager, scheduler, and MPI parallel computing library. In the container context, Kubernetes functions as the resource manager and Volcano functions as the scheduler. + + +To run HPC applications in the Kubernetes+Volcano environment is to run HPC jobs in containers, as shown in the following figure. + +{{
}} + +Two types of containers are involved: master and worker. The master container starts the mpirun and mpiexec commands, and the worker containers run computing jobs. + + +To support MPI jobs, Volcano has been enhanced to provide the following functions: + +- Multiple pod charts, which are used to define master and worker pods at the same time +- Gang scheduling, which ensures that all pods in a job are simultaneously started +- Mapping of host IP addresses of the master and worker pods +- SSH password-free login between the master and worker pods +- Job lifecycle management + +The following is an example of running an MPI job on Volcano. + +1. Define a Volcano MPI job by the mpi_sample.yaml file. + +``` +apiVersion: batch.Volcano.sh/v1alpha1 +kind: Job +metadata: + name: mpi-job + labels: + # Set the job type based on service requirements. + "Volcano.sh/job-type": "MPI" +spec: + # Set the minimum number of required pods (less than the total number of replicas). + # For this example, set it to the total number of mpimaster and mpiworker replicas. + minAvailable: 3 + # Specify Volcano as the scheduler. + schedulerName: Volcano + plugins: + # Configure SSH password-free authentication. + ssh: [] + # Configure the network information, such as hosts file and headless Service, required for running the job. + svc: [] + # Define a policy in which the entire MPI job will be restarted when a pod is evicted. + policies: + - event: PodEvicted + action: RestartJob + tasks: + - replicas: 1 + name: mpimaster + # Define another policy in which the entire MPI job will be considered as complete when mpiexec execution completes. + policies: + - event: TaskCompleted + action: CompleteJob + template: + spec: + # The Volcano-related information will be stored in the /etc/Volcano directory. + containers: + # The master container will perform the following operations: + # 1. Start the sshd service. + # 2. Obtain the mpiworker container list from /etc/Volcano/mpiworker.host. + # 3. Run mpirun/mpiexec. + - command: + - /bin/sh + - -c + - | + MPI_HOST=`cat /etc/Volcano/mpiworker.host | tr "\n" ","`; + mkdir -p /var/run/sshd; /usr/sbin/sshd; + mpiexec --allow-run-as-root --host ${MPI_HOST} -np 2 mpi_hello_world; + image: Volcanosh/example-mpi:0.0.1 + imagePullPolicy: IfNotPresent + name: mpimaster + ports: + - containerPort: 22 + name: mpijob-port + workingDir: /home + resources: + requests: + cpu: "100m" + memory: "1024Mi" + limits: + cpu: "100m" + memory: "1024Mi" + restartPolicy: OnFailure + imagePullSecrets: + - name: default-secret + - replicas: 2 + name: mpiworker + template: + spec: + containers: + # The worker containers will only start the sshd service. + - command: + - /bin/sh + - -c + - | + mkdir -p /var/run/sshd; /usr/sbin/sshd -D; + image: Volcanosh/example-mpi:0.0.1 + imagePullPolicy: IfNotPresent + name: mpiworker + ports: + - containerPort: 22 + name: mpijob-port + workingDir: /home + resources: + requests: + cpu: "100m" + memory: "2048Mi" + limits: + cpu: "100m" + memory: "2048Mi" + restartPolicy: OnFailure + imagePullSecrets: + - name: default-secret +``` + + +2. Commit the Volcano MPI job. + + + +{{
}} + + +The job is executed. + +{{
}} + +3. Check the execution result of the master pod. + +{{
}} + + +The preceding execution result shows that Volcano clears only the worker pods and retains the master pod after the job completes. In this way, you can run the kubectl command to obtain the execution result. + + +Note that there may be latency in the container network. When a job starts, the master pod may fail to connect to the worker pods. If this happens, Volcano will automatically restart the master pod to make the job run properly. + + +If you intend to use Volcano to run a WRF job, you need to replace mpi_hello_world with real.exe and wrf.exe and perform the following operations: + +- Build Docker images, which must include a complete WRF running environment. + +- Mount the data (original or intermediate data) required for calculation to the corresponding container. + +In this way, you can run meteorological simulation jobs in the Kubernetes+Volcano environment. \ No newline at end of file diff --git a/website-docusaurus/blog/2020-12-24-leinao-en.md b/website-docusaurus/blog/2020-12-24-leinao-en.md new file mode 100644 index 00000000..b3f9b55f --- /dev/null +++ b/website-docusaurus/blog/2020-12-24-leinao-en.md @@ -0,0 +1,137 @@ +--- +title: "Integrating Volcano into the Leinao Cloud OS" +description: "Deep introduction about the challenges and solutions faced by Volcano integration with Leinao Cloud OS" +date: 2020-12-24 +authors: [volcano] +--- +>This article was firstly released at `Container Cube` on December 24th, 2020, refer to [Volcano在中科类脑云OS中的落地实践](https://mp.weixin.qq.com/s/HS6RzzqztBJsHQX7P5T5ww) + + +## Introduction to the Leinao cloud AI platform + +The Leinao cloud AI platform includes an AI development platform, public service platform, AI visualized operations platform, and an AI community. +- The AI development platform provides end-to-end technical support and solutions for AI researchers in a diverse range of business scenarios. + +- The public service platform provides news and insight into AI for institutions and for the general public. + +- The AI visualized operations platform helps managers make better informed decisions. + +- The AI community is where AI developers and companies gather, exchange ideas, and improve their skills. + + + +## The architecture of Leinao cloud OS + + +{{
}} + + + +In the Leinao OS, the hardware platforms sit at the bottom of the architecture. On top of them are the job scheduling and data engines. + +The computing layer, the next layer up, is composed of a set of APIs used to create general distributed training jobs. + +Finally, on the top, is the application layer, which consists of various service systems, such as the model training, resource management, and O&M monitoring systems. + + +## Why Volcano? + +The Kubernetes default-scheduler does not work well for batch scheduling, and batch scheduling is critical to AI and big data services. Therefore, when we were building Leinao cloud OS 2.0, we decided to replace the default-scheduler with a scheduling engine that can better serve AI and big data scenarios. Volcano has diverse advanced scheduling policies and can easily connect to mainstream computing frameworks in the AI and big data sectors. + + + +More importantly, Volcano allows you to configure retry policies for distributed training jobs based on events rather on the number of retries. It is more flexible. In addition, it is more light-weighted than Hadoop, which is a distributed processing framework that also supports batch scheduling. After thorough analyses and comparisons, we finally chose Volcano for Leinao cloud OS 2.0. + +Volcano provides enhanced job APIs. + +{{
}} + +Volcano improves many aspects of default-scheduler. + +{{
}} + +Default-scheduler and Volcano work differently in a couple of other ways as well. + +{{
}} + + + +We encountered some obstacles when we connected the OS to Volcano. For example, OS development was already complete when we tried to connect it to Volcano, and connecting them directly required a huge change to the OS's computing and application layers. Moreover, Volcano did not support debugging jobs and tool sets yet. That was when we decided to introduce job-server, for API adaptation and integrated development of debugging tool sets. + + +{{
}} + +Another problem we faced was how to deal with task monitoring. Upper-layer services need detailed information on current and recent task status, and historical records, but Volcano only supports job monitoring. Should we customize Volcano or further develop the OS to support task monitoring? If we customize Volcano, it would get complicated later on, when we want to upgrade Volcano. The Volcano community iterates in a fast speed. We did not want to miss its latest features provided with every iteration. Therefore, we went with the latter choice, that is, to further develop the OS. + + + +The following figure shows the monitoring mechanism. It uses an API server to watch jobs and pods. + + +{{
}} + + + +## Creating a job + +__Scenario requirements:__ + +- Jobs can be created in batches. +- Debugging tool sets, such as JupyterLab, TensorBoard, code-server, and Wetty, are supported. +- Data storage set optimization policies can be configured for batch jobs. +- Training, quantization, and model conversion are supported. + +The workflow is as follows: + + +{{
}} + +{{
}} + + + +#### Deleting a job + +__Scenario requirements:__ + +- Jobs are automatically deleted when they finish. +- Additional capabilities (Jupyter and code-server) of the jobs are deleted, as well. + +**When the job finishes, Volcano automatically deletes it.** + +{{
}} + +**Related resources (pods, services, and ingresses) are deleted.** + +{{
}} + +{{
}} + + + + +#### Retrying a job + +In OS 1.0, retry policies are set simply based on the number of retries. Volcano allows you to set retry policies based on events. It is much more flexible and better suits our scenarios. We gave up on our original solution and adopted the retry mechanism of Volcano. + +{{
}} + +{{
}} + +- Policies defined in taskRole have a higher priority than the retry policies defined in jobs. A retry policy consists of an event and action. + +- event indicates a triggering condition. + +- action indicates the action to take when the specified triggering condition is met. + +- maxRetry indicates the maximum number of retries allowed for a job. + + +When we were developing the OS to support task monitoring, we received great support from the Volcano community. For example, we once found that RestartTask became invalid. The problem was solved the same day it was reported to the community. Their response was very fast. + +{{
}} + + +## Next Up + +We look forward to working with the community on topology scheduling and how to select the best GPU topology for scheduling when there are multiple GPUs on a single server. We seek deeper cooperation with the community in developing more advanced features and building a more inclusive ecosystem. \ No newline at end of file diff --git a/website-docusaurus/blog/2021-01-05-ruitian-en.md b/website-docusaurus/blog/2021-01-05-ruitian-en.md new file mode 100644 index 00000000..1f8adab3 --- /dev/null +++ b/website-docusaurus/blog/2021-01-05-ruitian-en.md @@ -0,0 +1,298 @@ +--- +title: "How Ruitian Used Volcano to Run Large-Scale Offline HPC Jobs" +description: "Deep introduction about the application practice cases of Volcano in the financial field" +date: 2021-01-05 +authors: [volcano] +--- +>This article was firstly released at `Container Cube` on January 5th, 2021, refer to [基于Volcano的锐天离线高性能计算最佳实践](https://mp.weixin.qq.com/s/FDYExtj93lCrXmiFRozBPA) + +## Why Volcano + +Ruitian Capital is a private equity investment firm committed to helping customers achieve returns by using diverse range of trading strategies. Ruitian offline computing clusters are dedicated to strategy development and big data processing. The clusters analyze large volumes of data to help develop quantitative models for stock and futures trading. + +In the early stages, we used Yarn to schedule batch jobs and Ceph to store massive data. As the company grew, our strategic planners have had to work in different environments. This has prompted us to look into container technologies for multi-environment research. + +With the mature and stable development of Kubernetes in recent years, container technologies, especially Kubernetes, have been widely used in computing clusters. However, the Kubernetes default-scheduler does not support: +- Running multiple pods in one job +- Fair-share scheduling of jobs assigned to different queues +- Gang scheduling +- Specifying a number of pods that must be scheduled for a job to be considered successfully scheduled +- Dominant Resource Fairness (DRF) algorithm + +We discovered the Volcano project as part of our research into these issues. Volcano is based on Kubernetes, and its robust job scheduling and control policies meet all of our requirements. Its simple architecture was a major reason we decided to migrate our scheduling platform from Yarn to Kubernetes. + + + +## How to Migrate to Volcano + +__Customization of Volcano Job Templates__ +Our strategic planners use clusters but do not know much about Kubernetes. We encapsulated the technical details of Kubernetes and developed Jobctl to generate Volcano job templates. + +__Preliminary solution: Defining a job with multiple tasks__ + +``` +apiVersion: batch.volcano.sh/v1alpha1 +kind: Job +metadata: + name: awesome-job +spec: + minAvailable: 1 + tasks: + - name: simulation1 + replicas: 1 + template: + spec: + restartPolicy: Never + containers: + - name: worker + image: rt-python:latest + resources: + requests: + cpu: 1 + memory: 1Gi + limits: + cpu: 1 + memory: 1Gi + args: + - bash + - -c + - |- + python run.py --pickle-file /data/simulation/1.pickle + - name: simulation2 + replicas: 1 + template: + spec: + restartPolicy: Never + containers: + - name: worker + image: rt-python:latest + resources: + requests: + cpu: 1 + memory: 1Gi + limits: + cpu: 1 + memory: 1Gi + args: + - bash + - -c + - |- + python run.py --pickle-file /data/simulation/2.pickle + - name: simulation3 + replicas: 1 + template: + spec: + restartPolicy: Never + containers: + - name: worker + image: rt-python:latest + resources: + requests: + cpu: 1 + memory: 1Gi + limits: + cpu: 1 + memory: 1Gi + args: + - bash + - -c + - |- + python run.py --pickle-file /data/simulation/3.pickle +``` + +In this solution, you can set different parameters and images for pods to enable these pods to run different tasks. + +In most cases, tasks are executed at different times and are kept separate from each other. If all pods participate in scheduling and minAvailable is set to 1, the job state changes to Running when any one pod is successfully scheduled. However, during the trial run, we found that some tasks could not be submitted. This is because some strategic planners submitted more than 5,000 concurrent pods. The size of the generated YAML file can exceed 1.5 MiB, which is more than the default request size limit allowed by etcd. + +Considering that the load on etcd is high when a large number of jobs and pods are running, instead of simply increasing the default request size limit, we made some optimizations (see the final solution). + +__Final solution: Defining a job with one task and multiple replicas__ + +In most cases, as only parameters of all tasks in a job are different, you can use the multi-replica function to load the corresponding parameter file for each task based on the replica ID. In this way, the size of each request sent to etcd is reduced. + +``` +apiVersion: batch.volcano.sh/v1alpha1 +kind: Job +metadata: + name: awesome-job +spec: + minAvailable: 1 + tasks: + - name: simulation + replicas: 10 + template: + spec: + restartPolicy: Never + containers: + - name: worker + image: rt-python:latest + resources: + requests: + cpu: 1 + memory: 1Gi + limits: + cpu: 1 + memory: 1Gi + args: + - bash + - -c + - |- + python -u call_module_func.py --pickle-file /data/simulation/.pickle module.submodule magic_function +``` + +call_module_func.py is a boot script, which is mounted to the container using a ConfigMap. It is responsible for: + +- Converting to the replica ID, which is obtained from the host name in the container. For example, if the host name is awesome-job-awesome-job-1, the replica ID is 1. + +- Loading pickle parameters (mounted to the container using a PVC) and transferring them to the magic_function function in module.submodule. + +### Other Volcano Customizations + +__minSuccess__ + +Most jobs do not require the gang scheduling, but we don't want a job to be considered successfully scheduled unless all of the tasks it includes are successful. + +The Volcano parameter minAvailable did not meet our requirements, so we added a new parameter, minSuccess, to decouple the logic for determining a job success from the minAvailable parameter. + +``` +minSuccess := ps.job.Job.Spec.MinSuccess +if minSuccess == 0 { + minSuccess = jobReplicas +} +​ +if status.Succeeded >= minSuccess { + status.State.Phase = vcbatch.Completed + return true +} +​ +if status.Succeeded+status.Failed == jobReplicas { + if status.Failed != 0 { + status.State.Phase = vcbatch.Failed + } else { + status.State.Phase = vcbatch.Completed + } + return true +} +``` + +__autoMemoryScale__ + +The requested CPU and memory must be specified when a Volcano job is submitted. Most of our planners cannot estimate the memory required by their applications, which is why we developed the autoMemoryScale function to monitor OOM events. If an application exits due to an OOM event, the memory will be automatically scaled out and the application will be rescheduled, thereby reducing costs associated with trial and error. + +``` +for i := 0; i< int(ts.Replicas); i++ { + podName := fmt.Sprintf(jobhelpers.PodNameFmt, job.Name, name, i) + if pod, found := pods[pdName]; found { + if len(pod.Status.ContainerStatuses) == 0 { + continue + } + + reason := pod.Status.ContainerStatuses[0].State.Terminated.Reason + + if reason == "OOMKilled" { + podToScaleUp = append(podToScaleUp, pod) + jobResources := ts.Template.Spec.Containers[0].Resources + podResources := pod.Spec.Containers[0].Resources + + jobReqMem, _ := jobResources.Requests[v1.ResourceMemory] + podReqMem, _ := podResources.Requests[v1.ResourceMemory] + + if podReqMem.Value() >= jobReqMem.Value() { + scaleUpResource(jobResources.Requests, job.Spec.ScaleUpJobResourceRate) + scaleUpResource(jobResources.Limits, job.Spec.ScaleUpJobResourceRate) + ts.Template.Spec.Containers[0].Resources = jobResources + job.Spec.Task[taskId] = ts + } + } + } + } +``` + +__nodeZone__ + +In the original Yarn clusters, we can forcibly reserve some nodes for urgent jobs by partitioning. We want to retain this feature after migration. Our preliminary solution is to create an independent daily queue with a relatively low weight. In addition, we use nodeSelector to constrain the tasks in the daily queue to only be able to run on specific nodes. In actual processing, we found that the resources allocated to the daily queue are not enough to run those tasks when the cluster load is high. The reason is that resources are allocated based on the weight to ensure fair-share scheduling. + +For example, we define three queues: + +``` +name weight +Q1 45 +Q2 45 +daily 10 +``` + +The cluster has a total of 100 CPU and 100 GiB memory, among which 20 CPU and 20 GiB memory are reserved for the daily queue. When the three queues are heavily loaded, the daily queue can be allocated with only 10 CPU and 10 GiB memory due to the low weight. Although nodeSelector is used, the resource requirements cannot be met. + +Our solution is to enable schedulers to support nodeZone. Nodes in a cluster are divided into different zones. Each scheduler is responsible for scheduling pods on nodes in its matching zone. When all queues on a scheduler are heavily loaded, resource allocation on another scheduler is not affected. If there is only one queue on this scheduler, the queue can apply for all resources. +If you need to run tasks with different features, select different schedulers instead of a special queue (for example, daily queue) to avoid resource shortages. + + +On the basis of Volcano, we developed a feature where you can select nodes with specific labels and use multiple scheduler instances to schedule pods on nodes in different zones. + +``` +sc.nodeInformer.Informer().AddEventHandlerWithResyncPeriod( + cache.FilteringResourceEventHandler { + FilterFunc: func(obj interface{}) bool { + switch v := obj.(type) { + case *v1.Node: + nodeZone := v.Labels["node-zone"] + return nodeZone == sc.nodeZone + default: + return false + } + }, + Handler: cache.ResourceEventHandlerFuncs { + AddFunc: sc.AddNode, + UpdateFunc: sc.UpdateNode, + DeleteFunc: sc.DeleteNode, + }, + }, + 0, +) +``` + + +__Metric Monitoring__ + +Currently, Volcano mainly monitors scheduling performance metrics, but these metrics cannot fully meet our requirements. To address this issue, we defined additional metrics and developed the export server component. This component can: + +- Add a queue label for each task. (Pods generated by Volcano do not carry a queue label, which makes it difficult to search queue resources.) + +- Output the queue capability. + +- Output the job start time and end time. + +With the additional metrics, Grafana can display monitoring information for the cluster, queue, and node resources as well as the job progress. In this way, you can track cluster resource usage in real time, which facilitates troubleshooting. For example, when a job state is Pending, you can view the monitoring information to check whether the queue or cluster resources are used up or if the remaining resources on each node are sufficient for a single task. + +{{
}} + +{{
}} + + + +__WatchDog__ + +We developed the WatchDog component to perform automatic O&M on Volcano resources. WatchDog provides: + +- Automatic update of capability + +capability indicates the upper limit of resources a queue can use. It is difficult to maintain capability. When a node is added or deleted each time, you have to keep adjusting it. Now, WatchDog listens to node resource information and dynamically updates capability based on the queue weight. + +- Task status notification + +When a task is completed or fails, you will be notified of the task status in a timely manner. + +- Task resource usage notification + +WatchDog obtains the amount of requested and used resources of tasks from the monitoring system and sends you a notification. In this way, you can adjust the requested resource amount to improve the cluster resource usage. + +## Summary + +Volcano is critical to migrating our applications to Kubernetes. Its simple and clear design allows us to easily customize scheduling policies. + +So far, Volcano has been stable in the Ruitian production environment for more than half a year, with more than 100,000 jobs scheduled per day during peak hours. We always follow the updates of the Volcano community and actively participate in the community projects. We hope that more developers can join the Volcano community to let Volcano better handle various complex job scenarios flexibly, efficiently, and intelligently. + + + +About Ruitian +>Founded in Shanghai in 2013, Ruitian Capital is a private equity investment company that places great importance on scientific research and technological accumulation. The company has developed an industry-leading strategic R&D and retesting platform and has built on-premises clusters on hundreds of high-performance servers. The founder started his own business after some exceptional achievements with some of the world's top hedge funds. By the end of the first quarter of 2019, Ruitian had been listed in the first tier of the quantitative transaction field in China, with more than 90 managed funds and over 10 billion yuan under management. \ No newline at end of file diff --git a/website-docusaurus/blog/2021-05-27-xiaohongshu-en.md b/website-docusaurus/blog/2021-05-27-xiaohongshu-en.md new file mode 100644 index 00000000..3d4398b6 --- /dev/null +++ b/website-docusaurus/blog/2021-05-27-xiaohongshu-en.md @@ -0,0 +1,155 @@ +--- +title: "How Does Volcano Empower a Content Recommendation Engine in Xiaohongshu" +description: "Best practice of Volcano at Xiaohongshu" +date: 2021-05-27 +authors: [volcano] +--- +>This article was firstly released at `Container Cube` on May 27th, 2021, refer to [小红书基于Volcano的大规模离线与在线推荐模型训练实践](https://mp.weixin.qq.com/s/nZXZx78EQoHzRj1-LqMPQQ) + + +## Introduction to Xiaohongshu + +Xiaohongshu is a leading life-sharing community in China. Popular among female users and joined by more and more trendy boys, now Xiaohongshu has more than 100 million monthly active users. This UGC community has hundreds of thousands of notes submitted every day and nearly 10 billion views/hits per day. + + +The recommendation on the homepage is in the charge of our recommendation team and is one of the core service scenarios of Xiaohongshu. In the first years when Xiaohongshu was established, all of the recommended notes were manually selected without assistance of any machine learning models. As a result, we recommended the same content to almost every user. + + +Since 2016, we started to explore personalized recommendation for different users. In 2018, the first recommendation machine learning model based on SparkML and GBDT was introduced. It had only tens of thousands of parameters. Since the end of 2018, we accelerated the model iteration. By the second half of 2020, our model scale reached hundreds of billions of parameters. We also introduced online learning, and the model could be updated in hours. From April this year, the model is updated every few minutes, which means the model can capture users' behavior within one or two minutes to get users' short-term interests and generate recommendations that are more appealing for users. + + + +## Big Data Architecture in Xiaohongshu Search, Recommendation, and Ad Scenarios + + +{{
}} + + + +The architecture consists of four parts. The upper left corner shows the interaction between the client and the real-time service/tracking data service. After being started, the Xiaohongshu app requests an online service for recommendation. The online service caches the recommended notes and requested features, and returns the recommendation results to the client. When the user browses the notes recommended to him/her, a series of interaction behaviors are generated. The interaction behaviors become data flows that pass through the tracking data service and go to the original tracking data flow. + + +In the lower left corner, there are attribution and summary tasks used to clean and process user behavior data in real time to generate the label data flow. The label data flow and feature data flow are combined to generate the training samples, and the three major products of Xiaohongshu big data: training data, OLAP database, and offline Hive table. + + +The upper right corner shows online and offline training. Online training trains the data in real time to generate the updated data of the model. Offline training generates a full model and uploads it to the online service. + + +__Behavior Attributions and Labels__ + +{{
}} + +This task for user behaviors can be divided into two parts: attribution and label. Attribution is to associate each behavior captured in data tracking with the past behaviors of the user. For example, you browse 4 notes in the app, one found on the Discovery page, one on the search results page, and two on a blogger's homepage. You click Like for the final piece. Your browsing and clicking Like are tracked. The tracking data does not tell us what happened before the like is given. + + + We can determine the cause of the like behavior based on the user behavior flow and user's historical browsing records prior to the like. This process is called attribution. Through the attribution task, we can also add labels about why a user follows a blogger, and which of the blogger's notes the user has viewed before following the blogger. + + +Opposite to attribution, label calculation summarizes the actions performed by a user after a certain behavior. If the user browsed four notes on the Discovery page, for each note Xiaohongshu makes several labels about whether the user liked the notes after browsing, or whether the user tapped to enter the note details page and how long the user stayed on this page. This label data is important for subsequent model training and the generation of daily user reports. + + +__Real-Time Big Data Products for Search, Recommendation, and Ad__ + +{{
}} + + +After the label data is generated, the above three big data products are provided for the service. + + +The model training data is used to train models in real time, and provide more accurate and real-time information about users' latest interests. + + +Both the ad hoc data analysis and offline warehouse perform analysis based on the label data. The ad hoc data analysis is real-time. For example, if there is any change in the system and policy, effects should be observed immediately from a multi-dimensional segmentation perspective. In contrast, the offline data warehouse provides daily or weekly reports, or shows the changes have been made to certain metrics in the past six months. + + +__Online and Offline Model Training__ + +{{
}} + +Training data generated after the combination of label data and feature data is used for both offline and online training. + + +Though the same data source is used, there are differences between online and offline training. For online training, the data source is provided to Kafka for online consumption. After that, a model update data flow is output, which actually means that the last batch of model changes is released online in real time. Offline training is performed on a batch and daily basis. A full migration parameter model is released, and data is migrated back to the online service. + + +## Evolution from Offline Batch Computing to Online Stream Computing + +__Offline Batch Only__ + +{{
}} + +The preceding figure shows the earliest offline batch label calculation process. Click behaviors of users are collected and recorded in the ODS table, that is, the original log table. The attribution and label calculation are performed by the Spark task, which is an offline batch task. The processed data forms a data warehouse table to generate daily reports and experiment reports as big data products. In the batch environment, the report is generated on a T+1 cycle. Generally, we cannot obtain the complete reports until the second or third day after each experiment. + +__Offline Batch + Online Streaming__ + +{{
}} + +Increase in the number of developers poses higher requirements on service implementation. Therefore, we introduced the real-time link, which is completely based on the Flink framework. The real-time link inputs data through streaming Kafka, outputs the data to Kafka, and sends the data to the OLAP database and real-time experiment analysis. However, the challenge is that Spark and Flink are two different programming frameworks. For example, the logic for determining whether a click on an ad is valid is complex, because an interaction behavior or a stay of at least three seconds after the click is required before the click can be called a valid click. + + +If there are two data flows, the logic is implemented both in the offline service and the Flink framework. Many problems may occur when the same logic is implemented twice. One problem is that development needs to be repeated in both the Spark and Flink frameworks. A bigger problem is that the logic may be changed after being developed in both of the frameworks. + + +For some complex scenarios, the change may cause inconsistency between reports and requests. In some other scenarios, when the data warehouse makes a request offline, and the logic is implemented only in the Spark task, but not implemented offline, if we want to view the task, we need to implement the logic again, which will cause extra workload. + + +After the upgrade we made, all new labels are calculated in real time, not in Spark. However, interruption may occur in the real time mode. After the interruption, calculation may start from the latest data, and the earlier data may have changed. This problem is simple to solve in the offline mode, as we can re-run data of each hour to form the complete data. + + +We actually need to solve a technical problem: How to convert the real-time Flink training task from an offline data warehouse table to a real-time flow table, use the same computing logic to generate new data and then backfill the new data to the real-time flow table? In this way, the core logic of the real time and offline modes only needs to be implemented in real time, which solves the problem of inconsistent logics between the two developments. + +__Offline Training__ + +{{
}} + +The preceding figure shows the training process of the machine learning model. At the earliest, there was only one batch data calculation task. Feature data and user behavior data are stored in offline tables. The Spark task is used to combine the data to generate a training task, and then release the learning model. The entire process may be performed once a day. + +__Online + Offline Training__ + +{{
}} + +To capture users' real-time interests and make judgment on newly released notes more quickly, more real-time update is required for our models. Therefore, Flink is used for real-time model training. After Flink generates data, the Volcano scheduling system is used to update models in real time and in batches. + + +## Optimization and Multi-cloud Management of Offline Training + +{{
}} + +The preceding figure shows the technology stack of Xiaohongshu in machine learning and big data. Xiaohongshu is a relatively new company without on-premises equipment rooms. All of our services are deployed on clouds provided by cloud vendors, and most of the services are managed through Kubernetes. + + +We have two important platforms. One is the stream computing platform called Baichuan, which is used to manage the Flink tasks of real-time label computing and online learning mentioned above. The other is the task management platform for machine learning, which is called Firefly. Our model training is based on TensorFlow and runs on the machine learning platform. For sparse and discrete large-scale model training of recommendation, search and advertisement, we also developed a TensorFlow-based training framework, LarC. The framework models of TensorFlow and LarC run on the machine learning platform through Firefly. + + +The key point between these two platforms is how to schedule tasks to the Kubernetes clusters. In fact, the native Kubernetes has a big problem in this scenario, because it performs scheduling based on individual pods. + + +However, the stream computing and machine learning tasks are not single-pod tasks. They are tasks performed on a group of pods. Therefore, we have encountered a big challenge at the beginning. Assume that there are two jobs, each job contains 10 pods, and each pod requires one core of CPU. That is, 20 cores are required for the two jobs to run simultaneously. If the current cluster has only 15 available cores and we are using the native Kubernetes scheduler, the scheduler may schedule 7 cores to one job and 8 cores to another, so that both jobs can obtain some resources to run. However, neither of them can be completed properly because the numbers of cores allocated to them do not meet the requirements. As a result, deadlocks occur. This is caused by the limits of the native Kubernetes scheduling. + + +To solve this, we need to first schedule 10 cores to one job to ensure that it can be properly completed and exited. After that, the 10 cores are released and allocated to another job so that both jobs can be properly completed. + + +Based on these, we researched products and found Volcano. Its predecessor is Kubernetes batch and can completely meet our requirements. Therefore, we participate in the Kubernetes batch community and become a loyal user of Volcano. + + +__Enhanced Volcano Scheduling: binpack -> task-topology__ + +{{
}} + +The scheduling algorithm supported by the native Volcano is binpack. Machine learning training tasks are classified into two types: worker for performing forward and reverse computing, which is a computing task; ps with the main task of storing parameters, which is a memory-type service. If the native open-source Volcano is used, its default scheduling algorithm optimizes the resources to reduce fragments. Therefore, it will schedule as many tasks as possible to the same node, and then schedule all worker and ps tasks of those tasks to the same node. When that node does not have capacity for one of the ps tasks (taking ps1 as an example here), it can only be put on another node. + + +In this scenario, workers and ps0 are on the same node. The I/O between them does not cross nodes, leading to fast I/O and large storage capacity. But the running speed is slow because ps1 is on another node. + + +With task-topology algorithm, tasks are scheduled to nodes in a balanced manner, the speed is balanced, and the overall storage capacity is greatly improved. The optimization from binpack to task-topology can increase the throughput of task training by 10% to 20%. + +__Data Transfer Between Multiple Clouds__ + +{{
}} + +In the online mode, users are distributed to different AZs. The feature cache of the recommendation service is stored in the local AZ. After user data tracking, users are distributed to different clusters based on their requests, and the label system performs computing for each user. Finally, all label system computing is transferred to the cloud vendors that provide offline training and services for us to combine and generate data training, and perform distributed model training. The trained models are distributed to different AZs for online services. + + +How to implement transfer learning under this architecture Xiaohongshu users consume traffic on the homepage, which is a scenario where a large amount of data is generated and accumulated, and a model involving hundreds of billions of parameters will be trained. How do we use this large model built on recommendations for the search and advertisement scenarios? After the recommendation model is trained, it is synchronized to the search training cluster. The search training uses the search data to discover the recommendation model, and release the final search model online. In this way, the small-scale data training can obtain features of the large recommendation model training, so that the large recommendation model can be utilized by the search scenario. \ No newline at end of file diff --git a/website-docusaurus/blog/2021-06-01-pengcheng-en.md b/website-docusaurus/blog/2021-06-01-pengcheng-en.md new file mode 100644 index 00000000..963db698 --- /dev/null +++ b/website-docusaurus/blog/2021-06-01-pengcheng-en.md @@ -0,0 +1,212 @@ +--- +title: "OpenI-Octopus: How to Avoid Resource Preemption in Kubernetes Clusters" +description: "Volcano use case in scientific research" +date: 2021-06-01 +authors: [volcano] +--- +>This article was firstly released at `Container Cube` on September 30th, 2020, refer to[鹏城实验室启智章鱼教你彻底摆脱Kubernetes集群资源抢占难题](https://mp.weixin.qq.com/s/h4T7KbAiQZTKepYcTcgdlA) + +## Introduction to OpenI-Octopus + +OpenI-Octopus is a cluster management and resource scheduling system developed and maintained by Peng Cheng Laboratory, Peking University, and University of Science and Technology of China. + +- This system is completely open-source, complying with the Open-Intelligence license. + +- It deploys, manages, and schedules jobs using Kubernetes. + +- AI jobs can be run in clusters. Hardware such as GPUs, NPUs, FPGAs, Huawei Ascend chips, and Cambricon MLUs are supported. + +- It provides high-performance networks for AI and supports InfiniBand networks. + +- Monitoring and analysis tools for networks, platforms, and AI jobs are available. + +- Mainstream deep learning frameworks are supported. + +- A microservice-based architecture is used. + +{{
}} + + +The service architecture of OpenI-Octopus is illustrated above. The bottom layer is hardware. OpenI-Octopus supports various types of heterogeneous hardware, including CPUs, GPUs, NPUs, and FPGAs. Different hardware types are adapted so that the upper-layer Kubernetes services can identify and manage them. + + + +The second layer is the platform layer. The blue panels on the left cover the node management functions. OpenI-Octopus employs native Kubernetes functions, including orchestration planning and controllers, and enhances scheduling by integrating Volcano. + +Management components communicate with integrated development services through the API server. + +The rest-server module developed by OpenI-Octopus carries the core functions of the system and integrates monitoring tools such as Grafana and Prometheus, Elasticsearch, Jupyterlab proxy, and model repository. + +The panel on the right covers the capabilities of compute nodes, including image factory, O&M, job monitoring, kubebox client, and Jupyterlab client for users to log in to containers. + +The top layer is the services provided by the system, such as data engine, model repository, and project center. With remote interconnection, remote users can also enjoy cluster services. + + + +## Business Scenarios and Challenges + +{{
}} + +OpenI-Octopus is built for research teams and laboratories. They develop and train models in fields such as transportation, healthcare, and finance, model training, and perform model inference. These models are used for vehicle tracking, medical image recognition, auxiliary diagnosis, financial quantization, and many other applications. Some deep learning algorithms are used in these courses, which require strong compute resources. + + +OpenI-Octopus aims to break the model, data, and compute resource silos on the traditional platforms, and provide computing power through a single platform. + +- At the model layer, OpenI-Octopus provides a multi-architecture, heterogeneous model engine, which supports common open-source computing frameworks and provides model conversion for them. + +- At the data layer, OpenI-Octopus provides a multi-source, heterogeneous data engine, which supports heterogeneous data convergence and semi-automatic data labeling. + +- At the resource layer, OpenI-Octopus provides a distributed AI computing engine for job scheduling and the unified representation of heterogeneous hardware. + + +__Service Requirements:__ + +- Excellent performance in scientific research and applications of AI, including algorithm training and inference in fields such as smart transportation, healthcare, and finance + +- High-end heterogeneous hardware resources, clusters with 150P+ computing, and 10 PB-level high-speed storage + + +- Rapid and flexible deployment. The system runs reliably and stably for external teams to use. + +__Challenges:__ + +- No high-performance computing platform to meet service requirements in complex scenarios + +- Heterogeneous hardware resources need to be efficiently used and flexible scheduling policies must be supported. Resource preemption problems need to be resolved to avoid starvation of key task resources. + +- The system architecture must be scalable and services must be highly available. + + +## Why Volcano? + +At the beginning, OpenI-Octopus looked into several existing open source projects in the community. These projects can basically satisfy the service requirements and reduce the development workload. The OpenI-Octopus team narrowed down their choices to four resource schedulers. The first one was the default Kubernetes scheduler, which is not friendly to batch scheduling. The second choice was Yarn scheduler, which is based on Hadoop. However, the current architecture has been transformed to Kubernetes-based. Yarn does not fit. The last two were kube-batch and Volcano. Volcano is developed from kube-batch, and better supports deep learning and common computing frameworks. Volcano implements scheduling policies through plugins that can be easily customized to develop scenario-specific scheduling policies. That's why OpenI-Octopus chose to integrate Volcano. + +Volcano brings the following benefits: + +- Complete architecture and ecosystem; timely feedback from the fast growing community + +- Customizable plugins for scenario-specific scheduling policies. Take the binpack plug-in as an example. Its packing algorithm can reduce resource fragments, allowing your cluster resources to be fully used. + +- Job queue mechanism. Job queues allow clusters to be logically grouped. Users can configure compute resource quotas for different projects, and allocate different types of jobs to different queues for management. In this way, job and compute resource management can be finer-grained. + + +## Secondary Development Based on Volcano - Resource Status Statistics and Management + +OpenI-Octopus performed secondary development on Volcano and added some new capabilities. + + +The first capability is to collect statistics on resources and manage resource status. These resources include both cluster compute resources and resources such as jobs, tasks, and pods generated by Kubernetes after a user submits a job. + +{{
}} + +OpenI-Octopus manages to do so. It also allows users to customize the conditions and callback events of resource status transition and subscribe to the customized events and corresponding policies at the service logic layer. + + +Assume that there is a training job that uses an ensemble learning algorithm. Generally, a distributed training manner is used. It has a combination module and several individual learners, all of which can be regarded as tasks. Each individual learner is trained using one type of algorithm, and the combination module combines the results of each individual learner to output the final result. Once the final result is obtained, the entire training job is complete. In this job-task implementation based on Kubernetes, user can create one or more pods for a task. If you want the entire job to exit as long as the combination module runs to completion, instead of waiting until all tasks are successfully executed, You can customize a job exit policy in the scheduler and use the policy at the service layer. Different scenarios may require different policies, and that's why secondary development is needed. + +{{
}} + +This flowchart shows job state information is transferred among OpenI-Octopus, Kubernetes, and Volcano. + +First, both Volcano and OpenI-Octopus listen on all Kubernetes jobs. After the user submits a job to Kubernetes, Volcano updates the job state based on the monitored state of the pod started by the job.OpenI-Octopus will handle the job state changes.The key is how Volcano updates the job states to Kubernetes. + +OpenI-Octopus worked out its solution: + +1)Develop state machines for Jobs, Tasks, and Replicas. + +- More detailed resource state statistics and command output + +- Finer-grained job lifecycle management + +2)Customize events and policies. Back to the ensemble learning example. The entire job can run to completion upon the customized event (e.g. MainTaskEvent) released by the scheduler that the specific task is successfully executed. + +3)Implement lifecycle callback hooks, which can be added to any state transition event in any state machine. For example, the billing function collects statistics on the running duration of a job based on the start event and end event of the job. + + +__Volcano-based Secondary Development - Privilege Action__ + +Issues: + +- Resources are starved, and a large number of jobs in the queue keep waiting. + +- Urgent and key jobs need to be preferentially scheduled. + +- Users' jobs may be developed online and cannot be terminated unless allowed. + +Existing Capabilities of Volcano: + +- Jobs with different priorities in the same queue can be preempted. + +- Pod-based eviction + +- Immediate preemption + +Requirements: + +- When jobs in the same queue are from different tenants, different tenants should have different priorities and preemption permissions. + +- Job-based eviction + +- Delayed preemption + +{{
}} + +This flowchart shows how the delayed preemption plugin works. On the left is the running logic of the plugin in the scheduler. Kubernetes services lie in the middle, and the right part is the core OpenI-Octopus modules. + + +Specifically, the plugin finds the jobs that need to be preempted in Volcano. The compute resources occupied by these jobs must be sufficient for the high-priority jobs that are waiting. Then, the plugin updates the states of these jobs in Kubernetes. As soon as the core OpenI-Octopus modules detect the state changes, they start a timer to prepare for eviction of these jobs. If the job preemption is canceled while the timer countdown does not end or because the required resources have been released, the timer is also canceled. + + +The following chart shows the service logic. + +{{
}} + +1) A Boolean attribute called Preempt is added to each job, indicating whether the job is a preempted job. + +- Only jobs with lower priorities in the same queue can be preempted. + +2) The eviction is performed by job instead of by pod. + +- Pods are evicted based on the ID of the jobs to which the pods belong to reduce the number of affected jobs. + +- The scheduler notifies Openl-Octopus to stop the jobs at the service layer. + +3) Delayed preemption + +- The Privileged and WillEvicted states are added for the job state machine. + +- Jobs in Privileged or WillEvicted state cannot be preempted by other jobs. + +- If the state of a preempting or preempted job changes, the state of the other party changes accordingly. + +{{
}} + + +## Benefits + +__Enhanced capabilities__ + +- Large-scale distributed training jobs can run efficiently. + +- Multiple AI computing frameworks are supported. + +- Plugin-based scheduler supports customized development to satisfy scenario-specific requirements. + +- Multi-queue scheduling makes possible hardware resource grouping and dynamic resource allocation between groups. + +__Performance tuning__ + +- Hardware resource utilization is greatly improved to 90% or higher. + +- The average job scheduling latency is greatly reduced. The average job waiting time is reduced from 60 seconds (using the Yarn scheduler) to 10 seconds (using Volcano). + +- System stability is enhanced, cluster node resources are used in balance, and O&M workloads are reduced. + +{{
}} + + +- With 120+ nodes managed and 1100+ GPU cards in total, the GPU utilization can reach 90% or higher when the system is overloaded. + +- The resource usage of each node is balanced, and the difference is less than 20%. + +- Since the rollout in 2019, more than 120,000 jobs have been run. \ No newline at end of file diff --git a/website-docusaurus/blog/2021-06-15-ruitian2-en.md b/website-docusaurus/blog/2021-06-15-ruitian2-en.md new file mode 100644 index 00000000..d36c53dc --- /dev/null +++ b/website-docusaurus/blog/2021-06-15-ruitian2-en.md @@ -0,0 +1,254 @@ +--- +title: "Using Volcano in Large-Scale, Distributed Offline Computing" +description: "Volcano use case in the financial sector" +date: 2021-06-15 +authors: [volcano] +--- +>This article was firstly released at `Container Cube` on December 24th, 2020, refer to[锐天投资基于Volcano的大规模分布式离线计算平台的应用实践](https://mp.weixin.qq.com/s/dC4IDNG7FMGLigNJaj_Qug) + +## Service Scenarios and Solution Selection + +__Service Scenarios__ + +- VMs for research and development for policy personnel + +- AI training and inference + +- Data ETL + +- General-purposed, distributed batch processing jobs + +__Why Use Kubernetes?__ + +A distributed batch processing platform can be used to manage compute and storage resources. In this use case, Ruitian decided to use Kubernetes to manage compute resources due to the following reasons: + +- Containers streamline development for users in different environments. Ruitian has four to five groups of users who use different development environments and policies. Environment isolation posed a great challenge to resource management and development efficiency. Now with containers, environments are encapsulated in containers that can be scheduled using Kubernetes. + +- Heterogeneous devices such as GPUs can be supported through Device Plugins. + +- Data storage can be centralized by using etcd. + +- Kubernetes has a robust technology ecosystem. + +- The Go language complies with the technology stack in Ruitian. + + +__Why Use CephFS__ + + +CephFS is a type of distributed file storage interface provided by Ceph. Ceph provides three types of storage interfaces: S3, block storage, and CephFS. The reasons for using CephFS are as follows: + +- Posix Filesystem permission and interface: Local file systems are widely used in our businesses and CephFS provides stable file system mounting. In multi-tenant scenarios, each user has a UID, and the data of each user can be accessed only by themselves. Posix Filesystem provides a permission mechanism that allows users to seamlessly migrate their file permissions to SAP. + +- Strong consistency: A file written to node A can be directly read on node B. + +- Small file access at scale and high-bandwidth I/O + +- Hierarchical hardware support + +- Kubernetes ReadWriteMany PV + +## Why Volcano + +__Why not default-scheduler__ + +Ruitian did not choose the default-scheduler, because it cannot provide queue scheduling, fair scheduling, multi-tenant isolation, and advanced scheduling policies such as gang scheduling. Fair scheduling and advanced scheduling policies are the most important factors. Fair scheduling decides which job to run first when there are too many jobs in a queue or when the cluster has available resources. To achieve this, each queue must be mapped to a team, and each namespace must correspond to a user. The default-scheduler cannot meet the preceding requirements. + +Another option was kube-batch, a batch processing scheduler of the community. However, it is only a scheduler and does not provide any solution other than scheduling. What Ruitian needed was a batch processing solution that takes care of scheduling and processing for the environment and CRDs. + + + +{{
}} + +__Why is Volcano__ + +- Supports fair scheduling. + +- Supports advanced scheduling policies, such as gang scheduling and binpack. + +- Supports mutual access between pods through SSH plug-ins. + +- Supports injecting job dependencies to pods via ENV plug-ins and supports Tensorflow Worker Sharding. + +- Provides services externally via SVC plug-ins. + +Such a scheduling platform can satisfy Ruitian. + + +## System Architecture + +__Service Architecture__ + +{{
}} + +- Ceph-based high-performance storage + +- Kubernetes-based heterogeneous hardware management + +- Loki + Grafana for user and monitoring panel +- Hybrid deployment of middleware and application layer, making full use of cluster resources + +- Extended service scenarios with Batch Jobs + + +__Multi-tenancy__ + +{{
}} + +When a user submits a job, multi-tenancy can be a problem. For example, when a user adds a pod to a cluster, the cluster needs to know the running user and the UID. By default, the UID of a running user is that of the image builder, which means the UIDs of the pods submitted by all users can be the same. This is not allowed because the data obtained and generated by a user should not be accessible to other users. + +In this case, Ruitian uses Kubernetes namespaces to isolate all resources. One namespace corresponds to one user. Namespaces interconnect with the development information through the existing LDAP service and OIDC to authenticate users and authorize them through RBAC to use pod security policies (PSPs). A PSP requires users to specify UID and GID in SecurityContext when submitting a pod to a cluster. The entire runtime environment of the user is subject to these settings. + +With PSPs, users can be isolated when accessing data, which is all stored in Ceph. Multi-tenancy is thereby easily managed. + + +__Workflows__ + +{{
}} + +What comes next is basic workflows. The local configurations are rendered into a job YAML and then submitted. All dependency data of the user is synchronized to CephFS, and the pod is mounted with a PVC. Each user has the PVC permissions of their own directory in their own namespace. The permissions are managed and controlled through IBS. In this way, jobs are submitted to the cluster to run. + + +## In-depth Customization on Volcano + +In the basic submission framework, Ruitian provides libraries for users and is developing a submission tool, Jobctl. This tool can be used as a command line tool or as the Python list that is input to the notebook or directly to the Python script of the user. Jobctl supports asynchronous and synchronous submissions. In the asynchronous mode, jobs are continuously submitted to the entire cluster. After the jobs are submitted, Jobctl exits directly. In the synchronous mode, Jobctl submits and watches jobs, and returns the execution results to the user only after the jobs are complete. + +With Jobctl, Kubernetes complexities can be shielded for users. In addition, command line submission and Python Lib integration are supported, and the most basic parallel execution by replicas and by day is provided. + +{{
}} + +__OOM Auto Scale Up__ + +{{
}} + +The first customization is to scale up resources of the entire job during OOM. Users may not be able to configure the exact memory required, and need to submit the job again for verification after the OOM. Therefore, Ruitian customized OOMKill Auto Scale-Up to modify the Volcano Controller to automatically scale up the resources requested by the OOMKill pods. After the scale-up, the jobs are automatically submitted again. The user will be informed upon the successful submission. This function guarantees reasonable memory requests without manual intervention, combining the Volcano policy event mechanism mentioned above. + +__MinSuccess__ + +{{
}} + +- If the number of pods that run to completion reaches minAvailable, the job is complete. + +- Non-Gang jobs cannot be flexibly scheduled. + +{{
}} + +- If the number of pods that run to completion reaches minSuccess, the job is complete. + +- Decouple the number of jobs required by Gang and the number of jobs required for completing Jobs. + + +__NodeZone__ + +{{
}} + +- One Volcano instance manages all nodes. + +- Noisy Neighbor cannot be resolved. + +- Resources cannot be reserved for emergency. + +{{
}} + +- Multiple Volcano instances manage multiple zones. + +- Certain jobs are physically isolated. + +__Volcano Namespace Quota__ + +{{
}} + +The default Kubernetes quotas cannot satisfy Ruitian's system. When the native namespace quota is triggered, pods directly fail. Therefore, Ruitian re-designed the quota in Volcano. + +{{
}} + +- When the Volcano namespace quota is triggered, pod creation in a queue will be suspended. + +__Volcano Monitoring and Alarming__ + +{{
}} + +Volcano Exporter + +- Outputs the queue label of the job. + +- Outputs the queue capability. + +- Outputs the job start time and end time. + + + +WatchDog + +- Registers the Informer and collects metrics. + +- Reports job failure and usage alarms. + +- Automatically updates the queue capability. + + +__Job dashboard__ + +{{
}} + +The upper panel covers the information about all jobs and provides a state table to display the job completion status. The panels below display the CPU, memory, and network resource usage. The negative axes refer to wasted cluster resources, which are allocated to pods (jobs) but not actually used during job running. These time series tables can provide resource insights to users in real time. + + +__Cluster resource dashboard__ + +{{
}} + +Graphs show the usage of overall queue resources, including CPU and memory. For jobs that consume a large amount of resources, for example, 300 or 500 GiB of memory, users need to know whether there is any node that can run such jobs. Therefore, we need to display the resource usage of each node available. + + + +## Challenges and Solutions in High-Concurrency Scenarios + +In Ruitian, the number of compute nodes in a single cluster has reached 200 and long-time jobs (1 week) and short-time jobs (1 minute) co-exist. The total storage capacity is 1.5 PB, the read/write bandwidth is 15 GB/s, and the number of pods increases by 100,000 to 300,000 every day. These brought challenges. + +__Challenge 1: Too Large Jobs__ + +{{
}} + +Issues: +- The CPU usage exceeds Max Request Size (1.5 MB) of etcd when there are a large number of pods. + +- Adjusting Max Request Size will impact etcd due to a large number of objects. + + +Solution: +- Submit a job in the form of multiple replicas for a single task. +- The information provided by ENV plug-ins in a pod is read in Sharding mode. + + +__Challenge 2: Out of CPU/Memory__ + +{{
}} + +Issues: + +- There are a limited number of nodes, and a large number of short-term jobs keep being scheduled. + +- Kubelet PLEG is under great pressure, and the pod binding takes too long. + +{{
}} + + +Issues: + +- There are a limited number of nodes, and a large number of short-term jobs keep being scheduled. + +- Kubelet PLEG is under great pressure, and the pod binding takes too long. + +- The default session interval of Volcano is 1s. As a result, cache snapshots are inconsistent. + +- Out of CPU + Out of Memory + + +Solution: + +- Add binding task numbers for nodes. + +- When a snapshot is being created for a session, the nodes whose binding task number is smaller than 0 are skipped. + +{{
}} \ No newline at end of file diff --git a/website-docusaurus/blog/2021-08-31-1.4-release-en.md b/website-docusaurus/blog/2021-08-31-1.4-release-en.md new file mode 100644 index 00000000..9964c52f --- /dev/null +++ b/website-docusaurus/blog/2021-08-31-1.4-release-en.md @@ -0,0 +1,32 @@ +--- +title: "Volcano v1.4 (Beta) Release Note" +description: "Volcano v1.4 (Beta) Release Includes New Features Such as NUMA-Aware" +date: 2021-08-31 +authors: [volcano] +--- +>This article was firstly released at `Container Cube` on September 6th, 2021, refer to[Volcano v1.4.0-Beta发布,支持NUMA-Aware等多个重要特性](https://mp.weixin.qq.com/s/S5JAQI0uLoTEx0lvYDXM4Q) + +Volcano, CNCF's first batch computing project, is now available with a new version, v1.4 (Beta). This version includes multiple important features, such as resource ratio-based partitions on GPU nodes, NUMA-aware, mixed deployment of multiple schedulers, and greatly improved stability. + +__Resource ratio-based partitions on GPU nodes__ is developed to avoid idle GPUs while GPU-consuming jobs are starving. This is an important feature contributed by Leinao Cloud, a Volcano community member. + +Previously, a scheduler had separate rules for allocating scarce resources such as GPUs and common resources such as CPUs. That is, CPU-consuming jobs can be directly allocated to GPU nodes to consume CPU and memory resources without considering the upcoming GPU jobs and reserving no resources for them. Alternatively, an independent scheduler was configured for GPU nodes, which did not allow CPU-consuming jobs to be scheduled to GPU nodes. + +Now with resource ratio-based partitions, you can set a dominant resource (usually GPU) and configure a resource ratio (for example, GPU:CPU:Memory = 1:4:32) for the dominant resource. The scheduler ensures that the ratio of idle GPU, CPU, and memory resources on a GPU node is greater than or equal to the value you set. + +In this way, GPU-consuming jobs that meet the ratio requirement can be scheduled to the node at any time, preventing GPU wastes. Compared with other solutions in the industry, this more flexible method improves node resource utilization. + +For details about the feature design and usage, you can visit https://github.com/volcano-sh/volcano/blob/master/docs/design/proportional.md. + + +__CPU NUMA-aware__ is another important feature of this version. For computing-intensive jobs such as AI and big data jobs, enabling NUMA will significantly improve the computing efficiency. With CPU NUMA-aware scheduling, you can configure the NUMA policy to determine whether to enable NUMA for workloads. The scheduler will select a node that meets the NUMA requirements. + +For details about the feature design and usage, you can visit https://github.com/volcano-sh/volcano/blob/master/docs/design/numa-aware.md. + +You can now __deploy different types of schedulers__ in a Kubernetes cluster to properly schedule resources. The most common use case is deploying default-scheduler and Volcano together. Native Kubernetes resource objects, such as Deployments and StatefulSets, can be scheduled by default-scheduler, and high-performance computing workloads, such as Volcano Jobs, TensorFlow Jobs, and Spark Jobs, can be scheduled by Volcano. This solution can make the best possible use of each type of schedulers and reduce the concurrency pressure of a single scheduler. + +For details about the feature design and usage, you can visit https://github.com/volcano-sh/volcano/blob/master/docs/design/multi-scheduler.md. + +In addition to the preceding features, Volcano v1.4 (Beta) adds the stress testing automation framework and fixes bugs introduced by the resource comparison function robustness. + +The community is collecting roadmap features for Volcano v1.5. We have received requirements on support for cluster resource monitoring, hierarchical queues, enhanced Spark integration, and task dependency. Every piece of your suggestions and issues is welcome. \ No newline at end of file diff --git a/website-docusaurus/blog/2022-12-28-ing_case-en.md b/website-docusaurus/blog/2022-12-28-ing_case-en.md new file mode 100644 index 00000000..b7717a49 --- /dev/null +++ b/website-docusaurus/blog/2022-12-28-ing_case-en.md @@ -0,0 +1,106 @@ +--- +title: "ING Bank: How Volcano Empowers Their Big Data Analytics Platform" +description: "ING Bank: How Volcano Empowers Their Big Data Analytics Platform" +date: 2022-12-28 +authors: [volcano] +--- +>2On October 26, 2022, Krzysztof Adamski and Tinco Boekestijn from ING Group delivered a keynote speech "Efficient Scheduling Of High Performance Batch Computing For Analytics Workloads With Volcano" at KubeCon North America. The speech focused on how Volcano, a cloud native batch computing project, supports high-performance scheduling for big data analytics jobs on ING's data management platform. +More details: [KubeCon + CloudNativeCon North America](https://events.linuxfoundation.org/archive/2022/kubecon-cloudnativecon-north-america/program/schedule/) + +## Introduction to ING + +Internationale Nederlanden Groep (ING), a global financial institution of Dutch origin, was created in 1991 with the merger of Dutch insurer Nationale-Nederlanden and national postal bank NMB Postbank. + +ING provides services in more than 40 countries around the world. Core businesses are banking, insurance, and asset management. Their 56,000 employees serve 53.2 million customers worldwide, including natural persons, families, businesses, governments, and organizations such as IMF. + + +## Business Background + +Regulations and restrictions on banking vary depending on the country/region. Data silos, data security, and compliance requirements can be really challenging. It is not easy to introduce new technologies. Therefore, ING builds their Data Analytics Platform (DAP) to provide secure, self-service functionality for employees to manage services throughout the entire process. + +{{
}} + +In 2013, they conceptualized data platform. In 2018, ING introduced cloud native technologies to upgrade their infrastructure platform. Since then, more and more employees and departments turn to the platform, and by now, there are more than 400 projects on the data index platform. + +They aim to meet all analytics needs in a highly secure, self-service platform that has the following features: +- Open source tool model +- Powerful computing +- Strict security and compliance measures +- One platform for all +- Both global and local + + +## Challenges and Solutions +{{
}} + +ING is shifting from Hadoop to Kubernetes. They met some challenges in job management and multi-framework support. For example: + +- Job management + - Pod scheduling: Unaware of upper-layer applications. + - Lack of fine-grained lifecycle management + - Lack of dependencies of tasks and jobs +- Scheduling + - Lack of job-based scheduling, such as sorting, priority, preemption, fair scheduling, and resource reservation + - No advanced scheduling algorithms, such as those based on CPU topology, task topology, IO-awareness, and backfilling + - Lack of resource sharing among jobs, queues, and namespaces +- Multi-framework support + - Insufficient support for frameworks such as TensorFlow and PyTorch + - Complex management of each framework (such as resource planning and sharing) + +Managing applications (stateless and even stateful ones) with Kubernetes would be a perfect choice, if Kubernetes is as user-friendly as Yarn in the scheduling and management of batch computing jobs. Yarn also provides limited support, for example, on TensorFlow and PyTorch. Therefore, ING looked for better solutions. + +__Kubernetes + Hadoop__ +{{
}} +When managing clusters, ING once separated Hadoop and Kubernetes. They ran almost all Spark jobs in Hadoop clusters, and other tasks and algorithms in Kubernetes clusters. They want to run all the jobs in Kubernetes clusters to simplify management. + +{{
}} +When Kubernetes and Yarn work together, Kubernetes and Hadoop resources are statically divided. During office hours, Hadoop applications and Kubernetes use their own resources. Spark tasks, when heavily pressured, cannot be allocated extra resources. At night, there are only batch processing tasks in clusters. All Kubernetes resources are idle but cannot be allocated to Hadoop. In this case, resources are not fully used. + + +__Kubernetes with Volcano__ +{{
}} +When managing clusters with Kubernetes and scheduling Spark tasks with Volcano, resources do not need to be statically divided. Cluster resources can be dynamically re-allocated based on the priorities and resource pressure of pods, batch tasks, and interactive tasks, which greatly improves the overall utilization of cluster resources. + +For example, during office hours, idle resources of common service applications can be used by batch and interactive applications temporarily. In holidays or nights, batch applications can use all cluster resources for data computing. + +{{
}} +Volcano is a batch scheduling engine developed for Kubernetes with the following capabilities: + +- Job queues with weighted priority +- Able to commit above queue limits if the cluster has spare capacity +- Able to preempt pods when more pods come in +- Configurable strategies to deal with competing workloads +- Compatible with Yarn scheduling + +Volcano supplements Kubernetes in batch scheduling. Since Apache Spark 3.3, Volcano has become the default batch scheduler of Spark on Kubernetes, making it easier to install and use. + +## Highlighted Features +__Redundancy and Local Affinity__ +{{
}} +Volcano retains the affinity and anti-affinity policies for pods in Kubernetes, and adds those for tasks. + +{{
}} +The idea of DRF is that in a multi-resource environment, resource allocation should be determined by the dominant share of an entity (user or queue). The volcano-scheduler observes the dominant resource requested by each job and uses it as a measure of cluster resource usage. Based on this dominant resource, the volcano-scheduler calculates the share of the job. The job with a lower share has a higher scheduling priority. + +For example, a cluster has 18 CPUs and 72 GB memory in total. User1 and User2 are each allocated one queue. Any submitted job will get its scheduling priority based on the dominant resource. + +- For User1, the CPU share is 0.33 (6/18), the memory share is 0.33 (24/72), and the final share is 0.33. +- For User2, the CPU share is 0.67 (12/18), the memory share is 0.33 (24/72), and the final share is 0.67. + +Under a DRF policy, the job with a lower share will be first scheduled, that is, the job committed by User1. + +Queue resources in a cluster can be divided by configuring weights. However, overcommitted tasks in a queue can use the idle resources in other queues. In this example, after using up the CPUs of its own queue, User2 can use the idle CPUs of User1. When User1 commits a new task, it triggers resource preemption and reclaims the resources occupied by other queues. + +__Resource Reservation__ +{{
}} +Batch computing tasks and other services may preempt resources and cause conflicts. Assume there are two available nodes in a cluster and we need to deploy a unified service layer in the cluster to provide services externally, such as Presto or cache services like Alluxio, batch computing tasks may have already taken all resources and we can't deploy or upgrade that service layer. Therefore, ING's platform now allows users to reserve some resources for other services. + +__DRF Dashboard__ +{{
}} +ING built a DRF scheduling dashboard based on the monitoring data from Volcano to obtain scheduling data at different layers. In the service cluster, ING stores the tasks of interactive users in one queue, and the computing tasks of all key projects running on the data platform in another queue. ING can take certain resources from other queues to the key project queue, but that won't do any good to the tasks of interactive users. + +ING is considering displaying the peak hours of cluster use to provide users with more information. With this, users can decide when to start their tasks based on the cluster resource readiness, improving computing performance without complex configurations in the background. +{{
}} + +## Summary +Volcano abstracts batch task scheduling, allowing Kubernetes to better serve ING in task scheduling. ING will contribute their developed functions to the community, such as the DRF dashboard, idle resource reservation on each node, auto queue management, new Prometheus monitoring metrics, Grafana dashboard updates, kube-state-metrics update, and cluster role restrictions. \ No newline at end of file diff --git a/website-docusaurus/blog/2023-01-12-volcano-1.7.0-release-en.md b/website-docusaurus/blog/2023-01-12-volcano-1.7.0-release-en.md new file mode 100644 index 00000000..b1376e63 --- /dev/null +++ b/website-docusaurus/blog/2023-01-12-volcano-1.7.0-release-en.md @@ -0,0 +1,140 @@ +--- +title: "Volcano 1.7.0 Available Now" +description: "New features: enhanced plugin for PyTorch Jobs, Ray on Volcano, enhanced scheduling for general Kubernetes services, multi-architecture images of Volcano, and optimized queue status info" +date: 2023-01-12 +authors: [volcano] +--- +

+Volcano 1.7.0 is now available with the following new features: + +- **enhanced plugin for PyTorch Jobs** +- **Ray on Volcano** +- **enhanced scheduling for general Kubernetes services** +- **multi-architecture images of Volcano** +- **optimized queue status info** + +{{
}} +Volcano is the industry-first cloud native batch computing project. Open-sourced at KubeCon Shanghai in June 2019, it became an official CNCF project in April 2020. In April 2022, Volcano was promoted to a CNCF incubating project. By now, more than 490 global developers have committed code to the project. The community is seeing growing popularity among developers, partners, and users. + +### Key Features + +#### 1. Enhanced Plugin for PyTorch Jobs +As one of the most popular AI frameworks, PyTorch has been widely used in deep learning fields such as computer vision and natural language processing. More and more users turn to Kubernetes to run PyTorch in containers for higher resource utilization and parallel processing efficiency. + +Volcano 1.7 enhanced the plugin for PyTorch Jobs, freeing you from the manual configuration of container ports, MASTER_ADDR, MASTER_PORT, WORLD_SIZE, and RANK environment variables. + +Other enhanced plugins include those for TensorFlow, MPI, and PyTorch Jobs. They are designed to help you run computing jobs on desired training frameworks with ease. + +Volcano also provides an extended development framework for you to tailor Job plugins to your needs. + +Design Documentation: [Pytorch-plugin](https://github.com/volcano-sh/volcano/blob/master/docs/design/distributed-framework-plugins.md#pytorch-plugin)
+User Guide: [Pytorch-plugin-user-guide](https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_pytorch_plugin.md#pytorch-plugin-user-guide)
+Issue:[#2292](https://github.com/volcano-sh/volcano/issues/2292)
+ + +#### 2. Ray on Volcano +Ray is a unified framework for extending AI and Python applications. It can run on any machine, cluster, cloud, and Kubernetes cluster. Its community and ecosystem are growing steadily. + +As machine learning workloads are hosting computing jobs at a density higher than ever before, single-node environments are failing in providing enough resources for training tasks. Here's where Ray comes in, which seamlessly coordinates resources of the entire cluster, instead of a single node, to run the same set of code. Ray is designed for common scenarios and any type of workloads. + +For users running multiple types of Jobs, Volcano partners with Ray to provide high-performance batch scheduling. Ray on Volcano has been released in [KubeRay v0.4](https://github.com/ray-project/kuberay/releases/tag/v0.4.0). + +User Guide: [KubeRay-integration-with-Volcano](https://ray-project.github.io/kuberay/guidance/volcano-integration/#kuberay-integration-with-volcano)
+Issue: [#2429](https://github.com/volcano-sh/volcano/issues/2429), [#213](https://github.com/ray-project/kuberay/issues/213)
+ +#### 3. Enhance Scheduling for General Kubernetes Services +Schedulers have their own advantages according to the use case. For example, in batch computing, Volcano provides more scheduling policies and capabilities. In general scheduling, the Kubernetes default scheduler is more balanced. However, it's often the case that a user runs multiple types of tasks in the same cluster. When there are both batch computing and general tasks, scheduling can be a challenge. + +Starting from version 1.7, Volcano becomes fully compatible with the Kubernetes default scheduler to schedule and manage long-running services. Now you can use Volcano to centrally schedule both batch computing and general workloads. + +**Enhancements:** +
    +
  • Supports multiple types of schedulers for Volcano scheduler and webhook.
  • +
  • Supports NodeVolumeLimits plugin.
  • +
  • Supports VolumeZone plugin.
  • +
  • Supports PodTopologySpread plugin.
  • +
  • Supports SelectorSpread plugin.
  • +
+ +Support for Kubernetes 1.25 is also available in Volcano 1.7. + +Issue: [#2394](https://github.com/volcano-sh/volcano/issues/2394),[#2510](https://github.com/volcano-sh/volcano/issues/2510) + +#### 4. Multi-architecture Images +You can now compile multi-architecture Volcano images by a few clicks through cross compilation. For example, you can compile the base images of the amd64 and arm64 architectures on an amd64 host and push the images to the image repository. During installation and deployment, the system automatically selects a proper image based on the host architecture for you, more user-friendly than before. + +User Guide: [building-docker-images](https://github.com/volcano-sh/volcano/blob/master/docs/development/development.md#building-docker-images)
+Issue: [#2435](https://github.com/volcano-sh/volcano/pull/2435)
+ +#### 5. Optimized Queue Status Info +Volcano can now collect statistics on allocated resources in real time to the queue status info, which eases dynamic resource adjustment and puts cluster resources into good use. + +Volcano allocates and manages cluster resources by queues. The Capability field limits the resource use for each queue, which is a hard ceiling. + +Before, users had no clear view on the allocated resources in queues and idle resources among those defined by Capability. Creating a large number of workloads against insufficient resources may cause job suspension and unexpected cluster scale-out triggered by autoscaler, increasing the cloud resource costs. Now with more detailed status info, you can manage cluster resources more efficiently and avoid excess costs. + +Issue: [#2571](https://github.com/volcano-sh/volcano/issues/2571) + +### Contributors +Volcano 1.7.0 is brought into being from hundreds of code commits from 29 contributors. Thanks for your contributions. + +**Contributors on GitHub:**
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
@xiaoxubeii@jsolbrig@Yikun
@tgaddair@william-wang@elinx
@Abirdcfly@xiaoanyunfei@qiankunli
@wpeng102@waiterQ@hwdef
@WingkaiHo@Monokaix@kerthcet
@WulixuanS@autumn0207@jinzhejz
@lucming@jimoosciuc@LY-today
@dontan001@wangyang0616@Akiqqqqqqq
@zhoumingcheng@jiangkaihua@Thor-wl
@ccchenjiahuan@zhifanggao 
+ +#### Links +Release note:[v1.7.0](https://github.com/volcano-sh/volcano/releases/tag/v1.7.0)
+Branch:[release-1.7](https://github.com/volcano-sh/volcano/tree/release-1.7)
+ +### About Volcano +Volcano is designed for high-performance computing applications such as AI, big data, gene sequencing, and rendering, and supports mainstream general computing frameworks. More than 26,000 global developers joined us, among whom the in-house ones come from companies such as Huawei, AWS, Baidu, Tencent, JD, and Xiaohongshu. There are 2,800 Stars and 670 Forks for the project. Volcano has been proven feasible for mass data computing and analytics, such as AI, big data, and gene sequencing. Supported frameworks include Spark, Flink, TensorFlow, PyTorch, Argo, MindSpore, Paddlepaddle, Kubeflow, MPI, Horovod, MXNet, KubeGene, and Ray. The ecosystem is thriving with more developers and use cases coming up. \ No newline at end of file diff --git a/website-docusaurus/blog/2023-08-11-volcano-community-co-construction-program.md b/website-docusaurus/blog/2023-08-11-volcano-community-co-construction-program.md new file mode 100644 index 00000000..adb6765f --- /dev/null +++ b/website-docusaurus/blog/2023-08-11-volcano-community-co-construction-program.md @@ -0,0 +1,83 @@ +--- +title: "Volcano Community Co-construction Program" +description: "Huawei Cloud Joins Hands with 11 Partners to Launch Volcano Community Co-construction Program" +date: 2023-08-11 +authors: [volcano] +--- + As artificial intelligence (AI) technologies advance and large language models (LLMs) grow more popular, the demand for AI compute has been booming. This has generated huge demand for high-performance scheduling for the AI and for hardware like AI chips. + +{{
}} + + + + +Volcano is the first cloud native batch computing project in the industry. In 2019, it was donated by Huawei Cloud to the Cloud Native Computing Foundation (CNCF) and became CNCF's first and only batch computing incubator project. Volcano provides unified high-performance job management for AI, big data, and high-performance computing (HPC) and supports a variety of high-level scheduling policies, including online and offline scheduling, AI elastic training scheduling, service level agreement (SLA), topology-based scheduling, fairness, load aware scheduling, rescheduling, preemption, and reclamation. It offers unified lifecycle management, job dependency management, and task dependency management for workloads like Spark, Flink, PyTorch, MPI, and TensorFlow. In terms of fine-grained resource management, Volcano supports min-max queue resource management, queue resource reservation, and dynamic resource sharing for multi-tenant resource leasing or preemption. Additionally, Volcano schedules heterogeneous resources including x86, Arm, GPUs, and Ascend, and provides refined scheduling of CPUs and GPUs. Users can allocate resources based on their requirements and significantly improve cost-effectiveness using Volcano. + +The Volcano community has attracted **more than 58,000 developers worldwide** and won more than **3,200 Stars and over 730 Forks** on GitHub. The contributors include Huawei, AWS, IBM, Baidu, Tencent, JD, Xiaohongshu, 4Paradigm, BoCloud, DaoCloud, Ruitian Capital, Qiniu Cloud, Yinqing Technology, ByteDance, Kuaishou, Unisound, Infosys, Visa, NetEase, Red Hat, Kingsoft Cloud, Inspur, ZTE, Oracle, and iQIYI. + + +**More than 50 cases** related to Volcano have been implemented. These cases are widely distributed in industries such as Internet, advanced manufacturing, finance, life sciences, scientific research, autonomous driving, and medicine. They cover massive data computing and analysis scenarios like AI, big data, genomic sequencing, and rendering. The main users are Tencent, Amazon, ING Bank, Baidu, Xiaohongshu, DiDi, 360, iQIYI, Leinao, Pengcheng Laboratory, Cruise, Li Auto, Unisound, Ximalaya, Vipshop, GrandOmics, BOSS Zhipin, and so on. **With the expansion of the Volcano ecosystems, more and more users are highly willing to join the community.** **Huawei Cloud has worked with 11 partners to launch the Volcano community co-construction program and cultivate a more prosperous Volcano ecosystem.** + + + +According to **Deng Mingkun, General Manager of Huawei Cloud Open Source Services**, "The cloud native batch computing project, Volcano, has been widely adopted in domains such as AI, big data, genomic sequencing, rendering, transcoding, multimedia, and finance, since June 2019. A group of industry users not only actively promote the implementation of Volcano in production environments, but also contribute a lot to the Volcano community based on their own experience. Huawei Cloud intends to work with partners to launch the Volcano community co-construction program to create a more prosperous Volcano ecosystem and help more enterprises accelerate their cloud native progress." + +**The first batch of members to join the program are Baidu, BoCloud, 4Paradigm, Vipshop, Ruitian Capital, Leinao, Pinlan, 360, NetEase Shufan, Ximalaya, and BOSS Zhipin.** + +
{{
}}
+ +According to **Zhou Ti, the tech lead of Baidu's PaddlePaddle open source ecosystem**, "PaddlePaddle and Volcano jointly released the PaddlePaddle on Volcano solution to improve PaddlePaddle's computing efficiency. As a platform for high-performance computing, Volcano makes up for Kubernetes' lack of basic capabilities in machine learning, deep learning, HPC, and big data computing. Additionally, Volcano enhances the batch creation and lifecycle management of computing tasks, fair-share scheduling and other aspects on the basis of the native Kubernetes capability. These features meet PaddlePaddle's basic requirements." + + +**Zhao Anquan, General Manager of BoCloud PaaS**, said, "BoCloud's HPC solution, based on CNCF's Volcano scheduling engine, a product well respected by many customers, provides a high-concurrency computing platform that runs AI, big data, and simulation calculation applications, resolving many pain points in the industry. We also donated the industry's first HPC job orchestration component JobFlow to the Volcano community so that users can better apply cloud native technologies." + + +**Li Mengxuan, head of heterogeneous computing virtualization in 4Paradigm**, said, "The Volcano project enables us to solve the pain points encountered during the implementation of cloud native technologies in AI projects at a low cost, especially in terms of device reuse. The use of Volcano will significantly improve the cluster resource utilization. 4Paradigm will continuously contribute code to the community to build Volcano into a reuse platform that supports all mainstream forms of heterogeneous compute such as NPUs, GPUs, MLUs, and DCUs." + + +**He Yingpeng, head of Vipshop's AI cloud platform**, said, "As a top e-commerce platform in China, Vipshop faces problems associated with rapid growth, rapid product iteration, and maintaining a diverse product portfolio. A Volcano-based AI training platform with advanced scheduling policies like queue and gang scheduling can support scheduling of more than 100,000 vCPUs, accelerating Vipshop's service innovation." + + +**Chang Feng, head of the Leinao R&D Center**, said, "Volcano is one of the first open source cloud native projects for batch computing. It has dynamically configurable advanced scheduling policies and excellent resource management capabilities, which can address multiple challenges, like job scheduling, lifecycle management, and heterogeneous hardware support in AI scenarios. During the implementation, we expanded Volcano's capabilities to effectively improve our system stability and resource utilization." + + +**Peng Jingtian, co-founder and CTO of Pinlan**, said, "CNCF's Volcano project has been successfully applied to our cloud native intelligent building design platform — AlphaDraw. Volcano provides AlphaDraw's algorithm services with batch processing and auto scaling capabilities in scenarios like AI-based model flipping of Computer Aided Design (CAD) two-dimensional drawings and intelligent design of three-dimensional building models, greatly improving Kubernetes cluster resource utilization and optimizing workload performance. As the first member of the Volcano community co-construction program, Pinlan continuously contributes best practices for Cloud+AI in the architectural design field to the community. We expect AlphaDraw and Volcano to develop together to continuously provide more excellent products and solutions for intelligent cloud computing and the cloud native progress of the industry in the future." + + +**Wang Xinyong, a cloud native technology expert from NetEase Shufan**, said, "Volcano provides many useful supplements to Kubernetes' native capabilities, enabling it to better orchestrate batch processing tasks like AI training and big data computing. Volcano's excellent task abstraction and management capabilities, multiple scenario-based scheduling mechanisms, and out-of-the-box integration with multiple common open source computing frameworks enable us to focus more on providing business value for users without spending a lot of efforts on reinventing systems." + + +**The owner of the Ruitian Capital Infrastructure Team** said, "Volcano supplements native Kubernetes capabilities such as batch task scheduling, resource sharing, and fair scheduling policies, and provides unified interfaces to reduce learning and maintenance costs. In the production environment, Volcano works with our proprietary level-2 scheduling to meet the requirements of tens of thousands of tasks per day, greatly improving the efficiency of policy research." + + +**The leader of the 360 container team** said, "Volcano makes up for Kubernetes' lack of basic scheduling capabilities in machine learning and big data computing tasks. It provides various plug-ins to schedule tasks in different scenarios, greatly improving the cluster utilization. Additionally, Volcano supports most mainstream computing frameworks like Spark, TensorFlow, and Flink. The overall design of Volcano follows the design and mechanisms of Kubernetes, which reduces our learning costs." + + +**The head of the Ximalaya AI cloud team** said, "Volcano enhances Kubernetes' capabilities like batch task scheduling, resource sharing, and fair scheduling; and provides elastic scheduling. As a basic component for resource scheduling of the machine learning platform, Volcano improves GPU utilization in the production environment. + + +**The owner of BOSS Zhipin AI fundamental platform team** said, "BOSS Zhipin builds infrastructures based on Volcano in AI and big data computing scenarios. Volcano's powerful batch processing and robust scheduling policies are very convenient for us. They help support complex service scenarios and greatly improve BOSS Zhipin's cluster resource utilization and stability. With the support of its robust ecosystem and the community, Volcano has greatly helped our technological and business development." + +We look forward to working with more organizations to build a more inclusive Volcano community.
+ +
+ **Introduction to the Volcano Community Co-construction Program** +
+ + +The Volcano community launched the co-construction program to more quickly include users into the Volcano community, to accelerate cloud native progress, and to ensure a diverse Volcano ecosystem. + +Through this program, you will have opportunities for technological guidance, promotion, as well as online and offline technological sharing. If your company or organization recognizes the value that Volcano has to offer, wants help using Volcano, or wants to exert their technological influence, consider joining the program. + +For details about the requirements and benefits, see https://github.com/volcano-sh/community/blob/master/community-building-program.md. + + +## Application to the program +- Scan the QR code or click to read the full text and fill in the application form. + +
{{
}}
+ +- The result will be sent by email. Please wait. + + + **If you have any questions, please contact the Volcano community at wang.platform@gmail.com.** \ No newline at end of file diff --git a/website-docusaurus/blog/2024-01-31-volcano-1.8.2-release.md b/website-docusaurus/blog/2024-01-31-volcano-1.8.2-release.md new file mode 100644 index 00000000..d566dfda --- /dev/null +++ b/website-docusaurus/blog/2024-01-31-volcano-1.8.2-release.md @@ -0,0 +1,241 @@ +--- +title: "Volcano v1.8.2 Available Now" +description: "New features: Support for vGPU scheduling and isolation, support for vGPU and user-defined resource preemption capabilities, addition of JobFlow workflow scheduling engine, node load-aware scheduling and rescheduling support for diverse monitoring systems, optimization of Volcano's ability to schedule microservices, optimization of Volcano charts packages for publishing and archiving, etc." +date: 2024-01-31 +authors: [volcano] +--- +On January 9, 2024, UTC+8, Volcano version v1.8.2 was officially released. This version added the following new features: + +- **Support for vGPU scheduling and isolation** + +- **Support for vGPU and user-defined resource preemption capabilities** + +- **Addition of JobFlow workflow scheduling engine** + +- **Node load-aware scheduling and rescheduling support for diverse monitoring systems** + +- **Optimization of Volcano's ability to schedule microservices** + +- **Optimization of Volcano charts packages for publishing and archiving** + +{{
}} +Volcano is the industry-first cloud native batch computing project. Open-sourced at KubeCon Shanghai in June 2019, it became an official CNCF project in April 2020. In April 2022, Volcano was promoted to a CNCF incubating project. By now, more than 600 global developers have committed code to the project. The community is seeing growing popularity among developers, partners, and users. + +### Key Features + +#### Support for vGPU scheduling and isolation +Since ChatGPT became popular, the research and development of AI big models has been endless, and different kinds of AI big models have been launched one after another. Due to its huge training tasks requiring a large amount of arithmetic power, the supply of arithmetic power with GPU as the core has become the key infrastructure for the development of the big model industry. In the actual use scenario, users have low resource utilization and inflexible resource allocation for GPU resources, and must purchase a large number of redundant heterogeneous arithmetic to meet the business needs, while the heterogeneous arithmetic itself is costly, which brings a great burden to the development of enterprises. +Starting from version 1.8, Volcano provides a common abstraction framework for shareable devices (GPUs, NPUs, FPGAs...) Volcano provides an abstract general framework for shareable devices (GPU, NPU, FPGA...), based on which developers can customize multiple types of shared devices; currently, Volcano has implemented GPU virtualization features based on this framework, which supports GPU device multiplexing, resource isolation and other capabilities, as follows: + +- GPU Sharing: Each task can request to use part of the resources of a GPU card, and GPU cards can be shared among multiple tasks. + +- Device Video Memory Control: GPUs can be allocated according to memory (e.g., 3000M) or proportionally (e.g., 50%) to achieve GPU virtualization resource isolation capability. + +For more information about vGPU, please refer to: + +- How to use the vGPU feature: + + https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_vgpu.md + +- How to add new heterogeneous arithmetic sharing strategies: + + https://github.com/volcano-sh/volcano/blob/master/docs/design/device-sharing.md + +#### Support for vGPU and user-defined resource preemption capabilities +Currently, Volcano supports CPU, Memory and other basic resources preemption, but does not yet support preemption of GPU resources and resources that users develop scheduling plug-ins based on the Volcano framework and manage on their own (e.g., NPU, network resources, etc.). +In version 1.8, Volcano restructured the node filtering related processing (PredicateFn callback function), and added the Status type in the return result, which is used to identify whether the current node meets the conditions of job issuance under the scenarios of scheduling, preemption, etc. The GPU preemption function has been released based on the optimized framework, and the user can use the scheduling plug-ins based on the secondary development of Volcano to combine with the business scenarios. The scheduling plug-in can be adapted and upgraded according to the business scenarios. +In version 1.8.2, Volcano supports the preemption of the number of node CSIs and the number of node Pods. + +For more information on supporting extended resource preemption, please refer to: + +https://github.com/volcano-sh/volcano/pull/2916 + +#### Addition of JobFlow workflow scheduling engine +JobFlow orchestration engine is widely used in high-performance computing, AI biomedical, image processing and beauty, game AGI, scientific computing and other scenarios, to help users simplify the management of multiple tasks in parallel and dependencies, and significantly improve the overall computing efficiency. +JobFlow is a lightweight task flow orchestration engine that focuses on Volcano's job orchestration, providing Volcano with job probes, job completion dependencies, job failure rate tolerance and other diverse job dependency types, and supporting complex process control primitives, with the following specific capabilities: + +- Supports large-scale job management and complex task flow scheduling. + +- Supports real-time query to all related jobs and task progress. + +- Supports automatic operation of jobs and timed startup to release labor costs. + +- Support for different tasks can set up a variety of action strategies, when the task meets specific conditions can trigger the corresponding action, such as timeout retry, node failure drift, etc. + +A demonstration of a JobFlow task running is shown below: + +
{{
}}
+ +For more information about JobFlow, please refer to: + +https://github.com/volcano-sh/volcano/blob/master/docs/design/jobflow/README.md + +#### Node load-aware scheduling and rescheduling support for diverse monitoring systems +The state of a Kubernetes cluster changes in real time as tasks are created and finished. In some scenarios (e.g., adding or removing nodes, changing the affinity of Pods and Nodes, dynamic changes in the job lifecycle, etc.), there are problems such as unbalanced resource utilization among cluster nodes and node performance bottlenecks, etc. At this time, scheduling and re-scheduling based on the real load can help us solve the above problems. +Before version 1.8 of Volcano, the real load scheduling and rescheduling metrics acquisition only supports Prometheus, from version 1.8 onwards, Volcano optimizes the monitoring metrics acquisition framework, adds support for ElasticSearch monitoring system, and supports smooth docking of more types of monitoring systems with less adaptation workload. + +For more information on supporting multiple monitoring systems, please refer to: + +- Node load-aware based scheduling: + + https://github.com/volcano-sh/volcano/blob/master/docs/design/usage-based-scheduling.md + +- Re-scheduling: + + https://github.com/volcano-sh/volcano/blob/master/docs/design/rescheduling.md + +#### Optimization of Volcano's ability to schedule microservices + +##### Add Kubernetes default scheduler plugin switch +Volcano is a unified converged scheduling system that not only supports AI, BigData and other computation jobs, but also supports microservice workloads, and is compatible with PodTopologySpread, VolumeZone, VolumeLimits, NodeAffinity, and other scheduling plug-ins that are part of the Kubernetes default scheduler, PodAffinity, NodeAffinity, PodAffinity, and other scheduling plugins, and the default scheduling plugin capabilities of Kubernetes are enabled by default in Volcano. +Since Volcano 1.8, Kubernetes default scheduling plug-ins can be turned on and off freely by means of configuration files, and all of them are turned on by default. If you choose to turn off some of the plug-ins, for example, turn off the PodTopologySpread and VolumeZone plug-ins, you can set the corresponding value in the predicate plug-in to If you want to disable some plug-ins, such as PodTopologySpread and VolumeZone plug-ins, you can set the corresponding value in the predicate plug-in to false: + +```yaml +actions: "allocate, backfill, preempt" +tiers: +- plugins: + - name: priority + - name: gang + - name: conformance +- plugins: + - name: drf + - name: predicates + arguments: + predicate.VolumeZoneEnable: false + predicate.PodTopologySpreadEnable: false + - name: proportion + - name: nodeorder +``` + +For more information, please refer to: + +https://github.com/volcano-sh/volcano/issues/2748 + +##### Enhanced Cluster Autoscaling Compatibility +In the Kubernetes platform, Volcano is increasingly used as a scheduler for general-purpose services, in addition to batch computing services.Node Autoscaler is one of the core features of Kubernetes, and it plays an important role in facing the surge in user traffic and saving operational costs. Volcano optimizes job scheduling and other related logic to enhance compatibility and interaction with ClusterAutoscaler, mainly in the following two areas: +Timely triggering of capacity expansion for pods entering pipeline state during scheduling phase +Candidate nodes are scored in gradients to reduce the impact of cluster terminating pods on the scheduling load, avoiding pods entering invalid pipeline states, which can lead to mis-expansion of the cluster. + +For more information, please refer to: + +https://github.com/volcano-sh/volcano/issues/3000 +https://github.com/volcano-sh/volcano/issues/2782 + +##### Fine-grained management of Node resources for increased resilience +When a node's total resources are less than the allocated resources due to some reasons such as device-plugin reporting anomalies, Volcano considers that the node's data is inconsistent, isolates the node, and stops scheduling any new workloads to the node. In version 1.8, node resource management is refined, for example: when the total GPU resource capacity of a node is less than the amount of allocated resources, pods applying for GPU resources are prohibited from scheduling to that node, while jobs applying for non-GPU resources will still be allowed to schedule to that node normally. + +For more information, please refer to: + +https://github.com/volcano-sh/volcano/issues/2999 + +#### Optimization of Volcano charts packages for publishing and archiving +As Volcano is used in more and more production and cloud environments, it is important to have a clean and standardized installation process. Starting from version 1.8, Volcano optimizes the charts package release archive action, standardizes the installation and usage process, and completes the migration of historical versions (v1.6, v1.7) to the new helm repository in the following ways: + +- Add Volcano charts bin address +```shell +helm repo add volcano-sh https://volcano-sh.github.io/helm-charts +``` + +- Search for all installable versions of Volcano +```shell +helm search repo volcano -l +``` + +- Install the latest version of Volcano +```shell +helm install volcano volcano-sh/volcano -n volcano-system --create-namespace +``` + +- Install the specified version of Volcano, e.g. 1.8.2. +```shell +helm install volcano volcano-sh/volcano -n volcano-system --create-namespace --version 1.8.2 +``` + +For more information on the Volcano charts package, please refer to: + +https://github.com/volcano-sh/helm-charts + +### Contributors + +Volcano 1.8.2 is brought into being from hundreds of code commits from 33 contributors. Thanks for your contributions. + +**Contributors on GitHub:**
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
@shaobo76@william-wang@gengwg
@kingeasternsun@Aakcht@waiterQ
@Shoothzj@hwdef@halegreen
@wulixuan@Monokaix@medicharlachiranjeevi
@WulixuanS@rayoluo@lowang-bh
@gj199575@noyoshi@Tongruizhe
@jinzhejz@Cdayz@Mufengzhe
@renwenlong-github@wangyang0616@jiamin13579
@zbbkeepgoing@jiangkaihua@z2Zhang
@archlitchi@lixin963@xiao-jay
@Yanping-io@Lily922@shusley244
+ +**Reference** + +Release note: v1.8.0 + +https://github.com/volcano-sh/volcano/releases/tag/v1.8.0 + +Release note: v1.8.1 + +https://github.com/volcano-sh/volcano/releases/tag/v1.8.1 + +Release note: v1.8.2 + +https://github.com/volcano-sh/volcano/releases/tag/v1.8.2 + +Branch:release-1.8 + +https://github.com/volcano-sh/volcano/tree/release-1.8 + +### About Volcano + +Volcano is designed for high-performance computing applications such as AI, big data, gene sequencing, and rendering, and supports mainstream general computing frameworks. More than 58,000 global developers joined us, among whom the in-house ones come from companies such as Huawei, AWS, Baidu, Tencent, JD, and Xiaohongshu. There are 3.5k+ Stars and 800+ Forks for the project. Volcano has been proven feasible for mass data computing and analytics, such as AI, big data, and gene sequencing. Supported frameworks include Spark, Flink, TensorFlow, PyTorch, Argo, MindSpore, Paddlepaddle, Kubeflow, MPI, Horovod, MXNet, KubeGene, and Ray. The ecosystem is thriving with more developers and use cases coming up. \ No newline at end of file diff --git a/website-docusaurus/blog/2024-03-08-meet-cloud-native-batch-computing-with-volcano-in-ai-&-big-data-scenarios.md b/website-docusaurus/blog/2024-03-08-meet-cloud-native-batch-computing-with-volcano-in-ai-&-big-data-scenarios.md new file mode 100644 index 00000000..2440dd2f --- /dev/null +++ b/website-docusaurus/blog/2024-03-08-meet-cloud-native-batch-computing-with-volcano-in-ai-&-big-data-scenarios.md @@ -0,0 +1,51 @@ +--- +title: "Meet Cloud Native Batch Computing with Volcano in AI & Big Data Scenarios" +description: "Join Volcano at KubeCon + CloudNativeCon Europe, 19-22 March in Paris!" +date: 2024-03-08 +authors: [volcano] +--- +Cloud native batch computing engine Volcano is designed for high-performance computing applications such as AI, big data, gene sequencing, and rendering, and supports mainstream general computing frameworks. More than 58,000 global developers joined us, among whom the in-house ones come from companies such as Huawei, AWS, Baidu, Tencent, JD, and Xiaohongshu. There are 3.7k+ Stars and 800+ Forks for the project. Volcano has been proven feasible for mass data computing and analytics, such as AI, big data, and gene sequencing. Supported frameworks include Spark, Flink, TensorFlow, PyTorch, Argo, MindSpore, Paddlepaddle, Kubeflow, MPI, Horovod, MXNet, KubeGene, and Ray. The ecosystem is thriving with more developers and use cases coming up. +{{
}} +As the industry-first cloud native batch computing project,Volcano was Open-sourced at KubeCon Shanghai in June 2019, it became an official CNCF project in April 2020. In April 2022, Volcano was promoted to a CNCF incubating project. By now, more than 600 global developers have committed code to the project. The community is seeing growing popularity among developers, partners, and users. + +### Try new features in Volcano v1.8.2 + +In Volcano’s Latest release v1.8.2 ,the following new features are added : + +- **Support for vGPU scheduling and isolation** + +- **Support for vGPU and user-defined resource preemption capabilities** + +- **Addition of JobFlow workflow scheduling engine** + +- **Node load-aware scheduling and rescheduling support for diverse monitoring systems** + +- **Optimization of Volcano’s ability to schedule microservices** + +- **Optimization of Volcano charts packages for publishing and archiving** + +Try Volcano v1.8.2:https://github.com/volcano-sh/volcano/releases/tag/v1.8.2 + + +### Join Volcano Community Co-construction Program +Recently,More than 50 cases related to Volcano have been implemented. These cases are widely distributed in industries such as Internet, advanced manufacturing, finance, life sciences, scientific research, autonomous driving, and medicine. They cover massive data computing and analysis scenarios like AI, big data, genomic sequencing, and rendering. The main users are Tencent, Amazon, ING Bank, Baidu, Xiaohongshu, DiDi, 360, iQIYI, Leinao, Pengcheng Laboratory, Cruise, Li Auto, Unisound, Ximalaya, Vipshop, GrandOmics, BOSS Zhipin, and so on. With the expansion of the Volcano ecosystems, more and more users are highly willing to join the community. + +The Volcano community launched the co-construction program to welcome users into the Volcano community, to accelerate cloud native progress, and to ensure a diverse Volcano ecosystem. + +Through this program, you will have opportunities for technological guidance, promotion, as well as online and offline technological sharing. If your company or organization recognizes the value that Volcano has to offer, wants help using Volcano, or wants to exert their technological influence, consider joining the program. +For details about the requirements and benefits, see https://github.com/volcano-sh/community/blob/master/community-building-program.md + + +### Join Volcano at KubeCon + CloudNativeCon Europe, 19-22 March in Paris! +
{{
}}
+Volcano will participate in several activities, including: + +- Speech Schedule + - March 19 at 14:05 - 14:30 am CET:Level 7.3 | Room S03 + Volcano Maintainer Kevin Wang, Huawei, presents“Efficient Multi-Cluster GPU Workload Management with Karmada and Volcano” + - March 22 at 11:55 - 12:30 am CET:Pavilion 7 | Level 7.3 | N03 + Volcano Maintainer William Wang, Huawei & Mengxuan Li, 4paradigm presents “Cloud Native Batch Computing with Volcano: Updates and Future ” + - March 22 at 16:00 - 16:35 am CET:Pavilion 7 | Level 7.3 | Paris Room + Volcano Maintainer William Wang & Hongcai Ren, Huawei presents “Maximizing GPU Utilization Over Multi-Cluster: Challenges and Solutions for Cloud-Native AI Platform” +- Booth Hours: + - March 20-22 PM(W, Th, F) :Stop by CNCF Project Pavilion Booth PP18-B at KubeCon +CloudNativeCon Europe to speak with an expert or see a demo! \ No newline at end of file diff --git a/website-docusaurus/blog/2024-05-21-volcano-1.9.0-release.md b/website-docusaurus/blog/2024-05-21-volcano-1.9.0-release.md new file mode 100644 index 00000000..96c400ae --- /dev/null +++ b/website-docusaurus/blog/2024-05-21-volcano-1.9.0-release.md @@ -0,0 +1,194 @@ +--- +title: "Volcano v1.9.0 Available Now" +description: "New features: Support elastic queue capacity scheduling, Supports affinity scheduling between queues and nodes, GPU sharing feature supports node scoring scheduling, Volcano Support for Kubernetes v1.29, Enhance scheduler metrics, Add license compliance check, Improve scheduling stability, etc." +date: 2024-05-21 +authors: [volcano] +--- +On May 21, 2024, UTC+8, Volcano version v1.9.0 was officially released. This version added the following new features: + +- **Support elastic queue capacity scheduling** + +- **Supports affinity scheduling between queues and nodes** + +- **GPU sharing feature supports node scoring scheduling** + +- **Volcano Support for Kubernetes v1.29** + +- **Enhance scheduler metrics** + +- **Add license compliance check** + +- **Improve scheduling stability** + +{{
}} +Volcano is the industry-first cloud native batch computing project. Open-sourced at KubeCon Shanghai in June 2019, it became an official CNCF project in April 2020. In April 2022, Volcano was promoted to a CNCF incubating project. By now, more than 600 global developers have committed code to the project. The community is seeing growing popularity among developers, partners, and users. + +### Key Features + +#### Support elastic queue capacity scheduling + +Volcano now uses the proportion plugin for queue management. Users can set the guarantee, capacity and other fields of the queue to set the reserved resources and capacity limit of the queue. And by setting the weight value of the queue to realize the resource sharing within the cluster, the queue is proportionally divided into cluster resources according to the weight value, but this queue management method has the following problems: + +- The capacity of the resources divided by the queue is reflected by the weight, which is not intuitive enough. +- All resources in the queue are divided using the same ratio, and the capacity cannot be set separately for each dimension of the queue. + +Based on the above considerations, Volcano implements a new queue elasticity capacity management capability, it supports: + +- Allows users to directly set the capacity of each dimension of resources for the queue instead of setting a weight value. +- Elastic capacity scheduling based deserved resources, and queue's resources can be shared and reclaimed back. + +For example, in AI large model training scenario, setting different resource capacities for different GPU models in the queue, such as A100 and V100, respectively. At the same time, when the cluster resources are idle, the queue can reuse the resources of other idle queues, and when needed, reclaim the resources set by the user for the queue, that is, the amount of resources deserved, so as to realize the elastic capacity scheduling. + +To use this feature, you need to set the deserved field of the queue and set the amount of resources to be deserved for each dimension. At the same time, you need to turn on the capacity plugin and turn off the proportion plugin in the scheduling configuration. + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: demo-queue +spec: + reclaimable: true + deserved: # set the deserved field. + cpu: 64 + memeory: 128Gi + nvidia.com/a100: 40 + nvidia.com/v100: 80 +``` + +For a complete usage example of queue elastic capacity scheduling, please refer to: +[How to use capacity plugin](https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_capacity_plugin.md). + +For the elastic queue capacity design document, please refer to: +[Capacity scheduling Design](https://github.com/volcano-sh/volcano/blob/master/docs/design/capacity-scheduling.md). + +#### Supports affinity scheduling between queues and nodes + +Queues are usually associated with departments within the company, and different departments usually need to use different heterogeneous resource types. For example, the large model training team needs to use NIVDIA’s Tesla GPU, and the recommendation team needs to use AMD’s GPU. When users submit jobs to the queue , the job needs to be automatically scheduled to the node of the corresponding resource type according to the attributes of the queue. + +Volcano has implemented affinity scheduling capabilities for queues and nodes. Users only need to set the node label that require affinity in the affinity field of the queue. Volcano will automatically schedule jobs submitted to the current queue to the nodes associated with the queue. Users do not need to Set the affinity of the job separately, and only need to set the affinity of the queue uniformly. Jobs submitted to the queue will be scheduled to the corresponding node based on the affinity of the queue and the node. + +This feature supports hard affinity, soft affinity, and anti-affinity scheduling at the same time. When using it, you need to set a label with the key `volcano.sh/nodegroup-name` for the node, and then set the affinity field of the queue to specify hard affinity, soft affinity label values. + +For example, the following queue setting means that jobs submitted to the queue need to be scheduled to nodes with label values of groupname1 and groupname2, and will be scheduled to nodes with label values of groupname2 first. At the same time, jobs cannot be scheduled to nodes with label values of groupname3 and groupname4, when resources are insufficient, it can also be scheduled to the node with the label value groupname3. + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: default + spec: + reclaimable: true + weight: 1 + affinity: # added field + nodeGroupAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - + - + preferredDuringSchedulingIgnoredDuringExecution: + - + nodeGroupAntiAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - + - + preferredDuringSchedulingIgnoredDuringExecution: + - +``` + +The scheduling plugin for this feature is called nodegroup, for a complete example of its use see: [How to use nodegroup plugin](https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to _use_nodegroup_plugin.md). + +For detailed design documentation, see [The nodegroup design](https://github.com/volcano-sh/volcano/blob/master/docs/design/node-group.md). + +#### GPU sharing feature supports node scoring scheduling + +GPU Sharing is a GPU sharing and isolation solution introduced in Volcano v1.8, which provides GPU sharing and device memory control capabilities to enhance the GPU resource utilization in AI training and inference scenarios. v1.9 adds a new scoring strategy for GPU nodes on top of this feature, so that the optimal node can be selected during job assignment to further enhance resource utilization. Users can set different scoring strategies. Currently, the following two strategies are supported: + +- Binpack: Provides a binpack algorithm for GPU card granularity, prioritizing to fill up a node with GPU cards that have already been allocated resources to avoid resource fragmentation and waste. + +- Spread: Prioritizes the use of idle GPU cards over shared cards that have already been allocated resources. + +For detailed usage documentation, please refer to: [How to use gpu sharing](https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_gpu_sharing.md). + +#### Volcano Support for Kubernetes v1.29 + +Volcano version follows the Kubernetes community version tempo and supports every base version of Kubernetes. The latest supported version is v1.29 and ran full UT, E2E use cases to ensure functionality and reliability. If you would like to participate in the development of Volcano adapting to new versions of Kubernetes, please refer to: https://github.com/volcano-sh/volcano/pull/3459 to make community contributions. + +#### Enhance scheduler metrics + +Volcano uses the client-go to talk with Kubernetes. Although the client can set the QPS to avoid requests from being flow-limited, it is difficult to observe how many QPS is actually used by the client, so in order to observe the frequency of requests from the client in real time, Volcano has added a new client-go metrics, which allows users to access the metrics to see the number of GET, POST and other requests per second, so as to get the actual QPS used per second, and thus decide whether or not the client needs to adjust the QPS. The client-go metrics also include client certificate rotation cycle statistics, response size per request statistics, etc. + +Users can use curl http://$volcano_scheduler_pod_ip:8080/metrics to get all the detailed metrics of volcano scheduler. + +Related PR: [#3274](https://github.com/volcano-sh/volcano/pull/3274).([@Monokaix](https://github.com/Monokaix)) + +#### Add license compliance check + +In order to enhance the open source license compliance governance standards of the Volcano community, avoid the introduction of infectious open source protocols, and avoid potential risks, the Volcano community has introduced an open source license compliance checking tool. The so-called infectious protocol refers to software that uses this protocol as an open source license. Derivative works generated after modification, use, and copying must also be open sourced under this agreement. If the third-party library introduced by the PR submitted by the developer contains infectious open source protocols such as GPL, LGPL, etc., CI Access Control will intercept it. The developer needs to replace the third-party library with a loose free software license protocol such as MIT, Apache 2.0, BSD, etc. , to pass the open source license compliance check. + +#### Improve scheduling stability + +Volcano v1.9.0 has done more optimization in preemption, retry for scheduling failure, avoiding memory leaks, security enhancement, etc. The details include: + +- Fix the problem of pods not being able to be scheduled due to frequent expansion and contraction of deployment in extreme cases, see PR for details: [#3376](https://github.com/volcano-sh/volcano/pull/3376).([@guoqinwill](https://github.com/guoqinwill)) + +- Fix Pod preemption: see PR for details: [#3458](https://github.com/volcano-sh/volcano/pull/3458).([LivingCcj](https://github.com/LivingCcj)) + +- Optimize Pod scheduling failure retry mechanism: see PR for details: [#3435](https://github.com/volcano-sh/volcano/pull/3435).([@bibibox](https://github.com/bibibox)) + +- Metrics optimization: [#3463](https://github.com/volcano-sh/volcano/pull/3463).([@Monokaix](https://github.com/Monokaix)) + +- Security enhancements: [#3449](https://github.com/volcano-sh/volcano/pull/3449).([@lekaf974](https://github.com/lekaf974)) + +### Contributors + +Volcano 1.9.0 is brought into being from hundreds of code commits from many contributors. Thanks for your contributions. + +**Contributors on GitHub:**
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
@daniel-hutao@wuyueandrew@googs1025
@7sunarni@flyingfang@LivingCcj
@guoqinwill@panoswoo@william-wang
@lekaf974@yangqz@lowang-bh
@loheagn@hwdef@archlitchi
@Lily922@bibibox@Monokaix
@belo4ya
+ +**Reference** + +Release note: v1.9.0 + +https://github.com/volcano-sh/volcano/releases/tag/v1.9.0 + +Branch:release-1.9 + +https://github.com/volcano-sh/volcano/tree/release-1.9 + +### About Volcano + +Volcano is designed for high-performance computing applications such as AI, big data, gene sequencing, and rendering, and supports mainstream general computing frameworks. More than 58,000 global developers joined us, among whom the in-house ones come from companies such as Huawei, AWS, Baidu, Tencent, JD, and Xiaohongshu. There are 3.8k+ Stars and 800+ Forks for the project. Volcano has been proven feasible for mass data computing and analytics, such as AI, big data, and gene sequencing. Supported frameworks include Spark, Flink, TensorFlow, PyTorch, Argo, MindSpore, Paddlepaddle, Kubeflow, MPI, Horovod, MXNet, KubeGene, and Ray. The ecosystem is thriving with more developers and use cases coming up. \ No newline at end of file diff --git a/website-docusaurus/blog/2024-09-29-volcano-1.10.0-release.md b/website-docusaurus/blog/2024-09-29-volcano-1.10.0-release.md new file mode 100644 index 00000000..3bd46ed5 --- /dev/null +++ b/website-docusaurus/blog/2024-09-29-volcano-1.10.0-release.md @@ -0,0 +1,216 @@ +--- +title: "Volcano v1.10.0 Available Now" +date: 2024-09-29 +authors: [volcano] +--- +On Sep 19, 2024, UTC+8, Volcano version v1.10.0 was officially released. This version introduced the following new features: + +- **Support Queue Priority Scheduling Strategy** + +- **Enable Fine-Grained GPU Resource Sharing and Reclaim** + +- **Introduce Pod Scheduling Readiness Support** + +- **Add Sidecar Container Scheduling Capabilities** + +- **Enhance Vcctl Command Line Tool** + +- **Ensure Compatibility with Kubernetes v1.30** + +- **Strengthen Volcano Security Measures** + +- **Optimize Volcano for Large-Scale Performance** + +- **Improve GPU Monitoring Function** + +- **Optimize Helm Chart Installation And Upgrade Processes** + +{{
}} +Volcano is the industry-first cloud native batch computing project. Open-sourced at KubeCon Shanghai in June 2019, it became an official CNCF project in April 2020. In April 2022, Volcano was promoted to a CNCF incubating project. By now, more than 600 global developers have committed code to the project. The community is seeing growing popularity among developers, partners, and users. + +## Key Features + +### Support Queue Priority Scheduling Strategy + +In traditional big data processing scenarios, users can directly set queue priorities to control the scheduling order of jobs. To ease the migration from Hadoop/Yarn to cloud-native platforms, Volcano supports setting priorities at the queue level, reducing migration costs for big data users while enhancing user experience and resource utilization efficiency. + +Queues are a fundamental resource in Volcano, each with its own priority. By default, a queue's priority is determined by its `share` value, which is calculated by dividing the resources allocated to the queue by its total capacity. This is done automatically, with no manual configuration needed. The smaller the `share` value, the fewer resources the queue has, making it less saturated and more likely to receive resources first. Thus, queues with smaller `share` values have higher priority, ensuring fairness in resource allocation. + +In production environments—especially in big data scenarios—users often prefer to manually set queue priorities to have a clearer understanding of the order in which queues are scheduled. Since the `share` value is dynamic and changes in real-time as resources are allocated, Volcano introduces a `priority` field to allow users to set queue priorities more intuitively. The higher the `priority`, the higher the queue's standing. High-priority queues receive resources first, while low-priority queues have their jobs reclaimed earlier when resources need to be recycled. + +Queue Priority Definition: + +```go +type QueueSpec struct { +... + // Priority define the priority of queue. Higher values are prioritized for scheduling and considered later during reclamation. + // +optional + Priority int32 `json:"priority,omitempty" protobuf:"bytes,10,opt,name=priority"` +} +``` + +To ensure compatibility with the `share` mechanism, Volcano also considers the share value when calculating queue priorities. By default, if a user has not set a specific queue priority or if priorities are equal, Volcano will fall back to comparing share values. In this case, the queue with the smaller share has higher priority. Users have the flexibility to choose between different priority strategies based on their specific needs—either by using the priority or the share method. + +For queue priority design doc, please refer to: [Queue priority](https://github.com/volcano-sh/volcano/blob/master/docs/design/queue-priority.md) + +### Enable Fine-Grained GPU Resource Sharing and Reclaim + +Volcano introduced the elastic queue capacity scheduling feature in version v1.9, allowing users to directly set the capacity for each resource dimension within a queue. This feature also supports elastic scheduling based on the `deserved` value, enabling more fine-grained resource sharing and recycling across queues. + +For detailed design information on elastic queue capacity scheduling, refer to the [Capacity Scheduling Design Document](https://github.com/volcano-sh/volcano/blob/master/docs/design/capacity-scheduling.md). + +For a step-by-step guide on using the capacity plugin, see the [Capacity Plugin User Guide](https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_capacity_plugin.md). + +Configure each dimension deserved resource samples for the queue: + +```yaml +apiVersion: scheduling.volcano.sh/v1beta1 +kind: Queue +metadata: + name: demo-queue +spec: + reclaimable: true + deserved: # set the deserved field. + cpu: 64 + memeory: 128Gi + nvidia.com/a100: 40 + nvidia.com/v100: 80 +``` + +In version v1.10, Volcano extends its support to include reporting different types of GPU resources within elastic queue capacities. NVIDIA's default `Device Plugin` does not distinguish between GPU models, instead reporting all resources uniformly as `nvidia.com/gpu`. This limits AI training and inference tasks from selecting specific GPU models, such as A100 or T4, based on their particular needs. To address this, Volcano now supports reporting distinct GPU models at the `Device Plugin` level, working with the `capacity` plugin to enable more precise GPU resource sharing and recycling. + +For instructions on using the `Device Plugin` to report various GPU models, please refer to the [GPU Resource Naming Guide](https://github.com/volcano-sh/devices/tree/release-1.1/docs/resource-naming). + +**Note:** + +In version v1.10.0, the `capacity` plugin is the default for queue management. Note that the `capacity` and `proportion` plugins are incompatible, so after upgrading to v1.10.0, you must set the `deserved` field for queues to ensure proper functionality. + +For detailed instructions, please refer to the [Capacity Plugin User Guide](https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_capacity_plugin.md). + +The `capacity` plugin allocates cluster resources based on the `deserved` value set by the user, while the `proportion` plugin dynamically allocates resources according to queue weight. Users can select either the `capacity` or `proportion` plugin for queue management based on their specific needs. + +For more details on the proportion plugin, please visit: [Proportion Plugin](https://volcano.sh/en/docs/plugins/#proportion). + +### Introduce Pod Scheduling Readiness Support + +Once a Pod is created, it is considered ready for scheduling. In Kube-scheduler, it will try its best to find a suitable node to place all pending Pods. However, in reality, some Pods may be in a "lack of necessary resources" state for a long time. These Pods actually interfere with the decision-making and operation of the scheduler (and downstream components such as Cluster AutoScaler) in an unnecessary way, causing problems such as resource waste. Pod Scheduling Readiness is a new feature of Kube-sheduler. In Kubernetes v.1.30 GA, it has become a stable feature. It controls the scheduling timing of Pods by setting the schedulingGates field of the Pod. + +
{{
}} +Pod SchedulingGates +
+ +In previous versions, Volcano has integrated all algorithms of the K8s default scheduler, fully covering the native scheduling functions of Kube-scheduler. Therefore, Volcano can completely replace Kube-scheduler as a unified scheduler under the cloud native platform, supporting unified scheduling of microservices and AI/big data workloads. In the latest version v1.10, Volcano has introduced Pod Scheduling Readiness scheduling capability to further meet users' scheduling needs in diverse scenarios. + +For the documentation of Pod Scheduling Readiness features, please refer to: [Pod Scheduling Readiness | Kubernetes](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/) + +For the Pod Scheduling Readiness design doc of volcano, please refer to: [Proposal for Support of Pod Scheduling Readiness by ykcai-daniel · Pull Request #3581 · volcano-sh/volcano (github.com)](https://github.com/volcano-sh/volcano/pull/3581) + +### Add Sidecar Container Scheduling Capabilities + +A Sidecar container is an auxiliary container designed to support the main business container by handling tasks such as logging, monitoring, and network initialization. + +Prior to Kubernetes v1.28, the concept of Sidecar containers existed only informally, with no dedicated API to distinguish them from business containers. Both types of containers were treated equally, which meant that Sidecar containers could be started after the business container and might end before it. Ideally, Sidecar containers should start before and finish after the business container to ensure complete collection of logs and monitoring data. + +Kubernetes v1.28 introduces formal support for Sidecar containers at the API level, implementing unified lifecycle management for init containers, Sidecar containers, and business containers. This update also adjusts how resource requests and limits are calculated for Pods, and the feature will enter Beta status in v1.29. + +The development of this feature involved extensive discussions, mainly focusing on maintaining compatibility with existing APIs and minimizing disruptive changes. Rather than introducing a new container type, Kubernetes reuses the init container type and designates Sidecar containers by setting the init container’s restartPolicy to Always. This approach addresses both API compatibility and lifecycle management issues effectively. + +With this update, the scheduling of Pods now considers the Sidecar container’s resource requests as part of the business container’s total requests. Consequently, the Volcano scheduler has been updated to support this new calculation method, allowing users to schedule Sidecar containers with Volcano. + +For more information on Sidecar containers, visit [Sidecar Containers | Kubernetes](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/). + +### Enhance Vcctl Command Line Tool + +vcctl is a command line tool for operating Volcano's built-in CRD resources. It can be conveniently used to view/delete/pause/resume vcjob resources, and supports viewing/deleting/opening/closing/updating queue resources. Volcano has enhanced vcctl in the new version, adding the following features: + +- Support creating/deleting/viewing/describing `jobflow` and `jobtemplate` resources + +- Support querying vcjob in a specified queue + +- Support querying Pods by queue and vcjob filtering + +For detailed guidance documents on vcctl, please refer to: [vcctl Command Line Enhancement](https://github.com/volcano-sh/volcano/blob/master/docs/design/command-line-enhancement.md#new-format-of-volcano-command-line). + +### Ensure Compatibility with Kubernetes v1.30 + +Volcano closely follows the pace of Kubernetes community versions and supports every major version of Kubernetes. The latest supported version is v1.30, and runs complete UT and E2E use cases to ensure functionality and reliability. + +If you want to participate in the development of Volcano adapting to the new version of Kubernetes, please refer to: [adapt-k8s-todo](https://github.com/volcano-sh/volcano/blob/master/docs/design/adapt-k8s-todo.md) for community contributions. + +### Strengthen Volcano Security Measures + +Volcano has always attached great importance to the security of the open source software supply chain. It follows the specifications defined by OpenSSF in terms of license compliance, security vulnerability disclosure and repair, warehouse branch protection, CI inspection, etc. Volcano recently added a new workflow to Github Action, which will run OpenSSF security checks when the code is merged, and update the software security score in real time to continuously improve software security. + +At the same time, Volcano has reduced the RBAC permissions of each component, retaining only the necessary permissions, avoiding potential risks of unauthorized access and improving the security of the system. + +Related PRs: + +[Added the scorecard github action and its badge by harshitasao · Pull Request #3655 · volcano-sh/volcano](https://github.com/volcano-sh/volcano/pull/3655) + +[Shrink permissions of vc scheduler & controller by Monokaix · Pull Request #3545 · volcano-sh/volcano (github.com)](https://github.com/volcano-sh/volcano/pull/3545) + +[Add pre-install&pre-upgrade hook for admission-init job by Monokaix · Pull Request #3504 · volcano-sh/volcano (github.com)](https://github.com/volcano-sh/volcano/pull/3504) + +### Optimize Volcano for Large-Scale Performance + +In large-scale scenarios, Volcano has done a lot of performance optimization work, mainly including: + +- Optimize vcjob update strategy, reduce vcjob update and synchronization frequency, reduce API Server pressure, and improve QPS of submitted tasks +- Add controller gate switch to vc controller, users can choose to close unnecessary controllers, reduce memory usage and CPU load +- All controllers use shared informer to reduce memory usage + +### Improve GPU Monitoring Function + +The new version of Volcano optimizes and enhances GPU monitoring indicators, fixes the problem of inaccurate GPU monitoring, and adds node information to the GPU computing power and video memory monitoring indicators, allowing users to more intuitively view the computing power of each GPU on each node, the total amount and allocated amount of video memory. + +Related PR: [Update volcano-vgpu monitoring system by archlitchi · Pull Request #3620 · volcano-sh/volcano (github.com)](https://github.com/volcano-sh/volcano/pull/3620/) + +### Optimize Helm Chart Installation And Upgrade Processes + +Volcano has optimized the installation and upgrade process of helm chart, and supports installing helm chart packages to set more custom parameters, mainly including: + +- By using the helm hook mechanism, after successfully installing Volcano, the volcano-admission-init job is automatically deleted to avoid the subsequent upgrade failure using helm upgrade, related PR: [Add pre-install&pre-upgrade hook for admission-init job by Monokaix · Pull Request #3504 · volcano-sh/volcano (github.com)](https://github.com/volcano-sh/volcano/pull/3504) + +- Update the secret file required by Volcano admission after each successful installation to avoid the problem of repeated installation and uninstallation of Volcano without specifying the helm package name, which will cause the Volcano admission process to fail, related PR: [Update volcano-admission secret when it already exists by Monokaix · Pull Request #3653 · volcano-sh/volcano (github.com)](https://github.com/volcano-sh/volcano/pull/3653) + +- Support setting common labels for resource objects in helm packages, related PR: [Add common labels for chart objects by Aakcht · Pull Request #3511 · volcano-sh/volcano (github.com)](https://github.com/volcano-sh/volcano/pull/3511) + +- Support setting log level for Volcano components through helm, related PR: [Expose volcano components (controller, scheduler, etc.) log level control to the helm chat values by chenshiwei-io · Pull Request #3656 · volcano-sh/volcano (github.com)](https://github.com/volcano-sh/volcano/pull/3656) + +- Support specifying the image registry of Volcano components through helm, related PR: [add image registry for helm by calvin0327 · Pull Request #3436 · volcano-sh/volcano (github.com)](https://github.com/volcano-sh/volcano/pull/3436) + +- Support setting container-level securityContext through helm, related PR: [feat: Add securityContext support at container level in helm chart templates by lekaf974 · Pull Request #3704 · volcano-sh/volcano (github.com)](https://github.com/volcano-sh/volcano/pull/3704) + +### Contributors + +Volcano 1.10.0 version includes hundreds of contributions from 36 community contributors. Thanks for your contributions. + +**Contributors on GitHub:** + +| **@googs1025** | **@WulixuanS** | **@SataQiu** | +| ------------------- | ----------------- | ------------------ | +| **@guoqinwill** | **@lowang-bh** | **@shruti2522** | +| **@lukasboettcher** | **@wangyysde** | **@bibibox** | +| **@Wang-Kai** | **@y-ykcir** | **@lekaf974** | +| **@yeahdongcn** | **@Monokaix** | **@Aakcht** | +| **@yxxhero** | **@babugeet** | **@liuyuanchun11** | +| **@MichaelXcc** | **@william-wang** | **@lengrongfu** | +| **@xieyanker** | **@lx1036** | **@archlitchi** | +| **@hwdef** | **@wangyang0616** | **@microyahoo** | +| **@snappyyouth** | **@harshitasao** | **@chenshiwei-io** | +| **@TaiPark** | **@Aakcht** | **@ykcai-daniel** | +| **@lekaf974** | **@JesseStutler** | **@belo4ya** | + +## Reference + +Release note: v1.10.0 + +https://github.com/volcano-sh/volcano/releases/tag/v1.10.0 + +Branch:release-1.10 + +https://github.com/volcano-sh/volcano/tree/release-1.10 + +## About Volcano + +Volcano is designed for high-performance computing applications such as AI, big data, gene sequencing, and rendering, and supports mainstream general computing frameworks. More than 58,000 global developers joined us, among whom the in-house ones come from companies such as Huawei, AWS, Baidu, Tencent, JD, and Xiaohongshu. There are 4.1k+ Stars and 900+ Forks for the project. Volcano has been proven feasible for mass data computing and analytics, such as AI, big data, and gene sequencing. Supported frameworks include Spark, Flink, TensorFlow, PyTorch, Argo, MindSpore, Paddlepaddle, Kubeflow, MPI, Horovod, MXNet, KubeGene, and Ray. The ecosystem is thriving with more developers and use cases coming up. \ No newline at end of file diff --git a/website-docusaurus/blog/2025-02-07-volcano-1.11.0-release.md b/website-docusaurus/blog/2025-02-07-volcano-1.11.0-release.md new file mode 100644 index 00000000..21884445 --- /dev/null +++ b/website-docusaurus/blog/2025-02-07-volcano-1.11.0-release.md @@ -0,0 +1,455 @@ +--- +title: "Volcano v1.11.0 Available Now" +description: "New Features: Network topology-aware scheduling, Elastic hierarchical queues, Multi-cluster AI job scheduling, online/offline workloads colocation with dynamic resource oversubscription, Load-aware descheduling, Fine-grained job fault recovery policies, Volcano Dashboard for resource visualization, Supports for Kubernetes v1.31, Volcano Job supports for Preemption Policy, and Performance optimizations for large-scale scenarios" +date: 2025-02-07 +authors: [volcano] +--- +# Volcano v1.11 released: A New Era of Cloud-Native Scheduling for AI and Big Data + +As the de facto standard in cloud-native batch computing, Volcano has been widely adopted across various scenarios, including AI, Big Data, and High-Performance Computing (HPC). With over 800 contributors from more than 30 countries and tens of thousands of code commits, Volcano has been deployed in production environments by over 60 enterprises worldwide. It provides the industry with excellent practical standards and solutions for cloud native batch computing. + +As user scenarios grow increasingly complex, especially in the scenarios of LLMs, there is a heightened demand for performance, GPU resource utilization, and availability in both training and inference workloads. This has driven Volcano to continuously expand its capabilities and address core user needs. Over the course of 28 releases, Volcano has introduced a series of enhancements and optimizations tailored to batch computing scenarios, helping users seamlessly migrate their workloads to cloud-native platforms. These improvements have resolved numerous pain points, earning the platform widespread praise and fostering a vibrant community with over 30 approvers and reviewers, creating a win-win ecosystem. + +The new release of Volcano will mark a new milestone in the New Year 2025, where the community will introduce a series of major features that will continue to deepen its focus on areas such as CNAI (Cloud Native AI) and Big Data, with key features including: + +**AI Scenarios:** + +- **Network Topology-Aware Scheduling:** Reduces network communication overhead between training tasks, optimizing performance for large AI model training. +- **NPU Scheduling and Virtualization:** Enhances NPU resource utilization. +- **GPU Dynamic Partitioning:** Introduces MIG and MPS dynamic partitioning to improve GPU resource utilization. +- **Volcano Global Multi-Cluster AI Job Scheduling:** Supports Multi-cluster AI job deployment and distribution. +- **Checkpointing and Fault Recovery Optimization:** Enables finer-grained job restart policies. +- **Dynamic Resource Allocation (DRA):** Supports flexible and efficient management of heterogeneous resources. + +**Big Data Scenarios:** + +- **Elastic Hierarchical Queues:** Facilitates smooth migration of Big Data workloads to cloud-native platforms. + +**Microservices Scenarios:** + +- **Online and Offline Workloads colocation with Dynamic Resource Oversubscription:** Boosts resource utilization while ensuring QoS for online workloads. +- **Load-Aware Scheduling and Descheduling:** Provides resource defragmentation and load balancing capabilities. + +**The official release of [Volcano v1.11](https://github.com/volcano-sh/volcano/releases/tag/v1.11.0)** marks a new chapter in cloud-native batch computing! This update focuses on the core needs of AI and Big Data, introducing **network topology-aware scheduling** and **multi-cluster AI job scheduling**, significantly enhancing the performance of AI training and inference tasks. Additionally, **online and offline workloads colocation with dynamic resource oversubscription** and **load-aware descheduling** further optimize resource utilization, ensuring high availability for online services. The introduction of **elastic hierarchical queues** offers more flexible scheduling strategies for Big Data scenarios. + +## **Deep Dive into Key Features** + +The v1.11 version released this time provides a series of major feature updates for AI, Big data and resource utilization improvement scenarios, mainly including: + +### **Network Topology-Aware Scheduling: Optimizing AI Large Model Training Performance** + +In AI large model training, model parallelism splits the model across multiple nodes, requiring frequent data exchange between nodes. Network communication often becomes a bottleneck, significantly impacting training efficiency. Data centers feature diverse network types like InfiniBand (IB), RoCE, and NVSwitch, with complex multi-layer switch topologies. The fewer switches spanned between two nodes, the lower the communication latency and the higher the throughput. Thus, users aim to schedule workloads in the optimal performance domain with the highest throughput and lowest latency. + +To address this, Volcano introduces **Network Topology-Aware Scheduling**, leveraging a unified network topology API and intelligent scheduling strategies to tackle network communication performance issues in large-scale AI training jobs. + +#### **Unified Network Topology API: Precise Network Structure Representation** + +To abstract away the differences in data center network types, Volcano defines a new CRD **HyperNode**, to represent network topology, providing a standardized API. Compared to traditional label-based approaches, HyperNode offers several advantages: + +- **Semantic Consistency:** HyperNode provides a standardized way to describe network topology, avoiding inconsistencies in label semantics. +- **Hierarchical Structure:** HyperNode supports tree-like hierarchies, accurately reflecting actual network topologies. +- **Ease of Management:** Cluster administrators can manually create HyperNodes or use automated network topology discovery tools to maintain them. + +A HyperNode represents a network topology performance domain, typically mapped to a switch. Multiple HyperNodes connect hierarchically to form a tree structure. For example: + +
{{
}} +
+ +- **Leaf HyperNodes** (s0, s1, s2, s3): Represent actual cluster nodes. +- **Non-Leaf HyperNodes** (s4, s5, s6): Represent other HyperNodes. + +In this structure, communication efficiency between nodes depends on the number of HyperNode layers they span. For instance: + +- **node0** and **node1** within s0 have the highest communication efficiency. +- **node1** and **node2** spanning two HyperNode layers (s0→s4→s1) have lower efficiency. +- **node0** and **node4** spanning three HyperNode layers (s0→s4→s6) have the lowest efficiency. + +##### **HyperNode Configuration Example** + +Here’s an example of leaf and non-leaf HyperNode configurations: + +**Leaf HyperNode Example:** + +```yaml +apiVersion: topology.volcano.sh/v1alpha1 +kind: HyperNode +metadata: + name: s0 +spec: + tier: 1 # Lower tiers indicate higher communication efficiency + members: # List of child nodes + - type: Node # Child node type + selector: + exactMatch: # Exact match + name: node-0 + - type: Node + selector: + regexMatch: # Regex match + pattern: node-[01] +``` + +**Non-Leaf HyperNode Example:** + +```yaml +apiVersion: topology.volcano.sh/v1alpha1 +kind: HyperNode +metadata: + name: s6 +spec: + tier: 3 # HyperNode tier + members: # List of child nodes + - type: HyperNode # Child node type + selector: + exactMatch: # Exact match + name: s4 + - type: HyperNode + selector: + exactMatch: # Exact match + name: s5 +``` + +#### **Network Topology-Aware Scheduling Strategy** + +Volcano Job and PodGroup can set topology constraints via the `networkTopology` field, supporting the following configurations: + +- **mode:** Supports `hard` and `soft` modes. + - `hard`: Enforces strict constraints, requiring tasks within a job to be deployed within the same HyperNode. + - `soft`: Prefers deploying tasks within the same HyperNode but allows flexibility. +- **highestTierAllowed:** Used with `hard` mode to specify the maximum HyperNode tier a job can span. + +For example, the following configuration restricts a job to HyperNodes of tier 2 or lower (e.g., s4 or s5), otherwise, the job remains in a Pending state: + +```yaml +spec: + networkTopology: + mode: hard + highestTierAllowed: 2 +``` + +This scheduling strategy allows users to precisely control job topology constraints, ensuring optimal performance and significantly improving training efficiency. + +#### **Future Plans** + +Volcano will continue to refine network topology-aware scheduling, with future plans including: + +- Automating the conversion of node labels to HyperNode CRs to simplify migration. +- Integrating network topology discovery tools to streamline HyperNode management. +- Providing CLI tools for easier HyperNode hierarchy visualization and management. + +For detailed design and user guide, please refer to: + +Design Document: **[Network Topology Aware Scheduling](https://volcano.sh/en/docs/network_topology_aware_schedulin)**. + +Usage Document: **[Network Topology Aware Scheduling | Volcano](https://volcano.sh/en/docs/network_topology_aware_scheduling)**. + +Sincerely thanks to community developers: **@ecosysbin, @weapons97, @Xu-Wentao, @penggu, @JesseStutler, @Monokaix** for their contributions! + +### **Elastic Hierarchical Queues: Flexible Multi-Tenant Resource Management** + +In multi-tenant environments, fair resource allocation, isolation, and job prioritization are critical. Different departments or teams often share cluster resources while ensuring their jobs receive resources on demand, avoiding contention or waste. Volcano v1.11 introduces **Elastic Hierarchical Queues**, significantly enhancing queue resource management. Hierarchical queues enable finer-grained resource quota management, cross-level resource sharing and reclamation, and flexible preemption policies, creating an efficient and fair unified scheduling platform. For users migrating from YARN, Volcano seamlessly transitions Big Data workloads to Kubernetes clusters. + +#### **Core Capabilities of Elastic Hierarchical Queues** + +Volcano’s elastic hierarchical queues offer the following key features to meet complex multi-tenant demands: + +1. **Configurable Queue Hierarchies:** Users can create multi-level queues in a tree structure, each with independent resource quotas and priorities. +2. **Cross-Level Resource Sharing and Reclamation:** Idle resources in child queues can be shared with sibling queues and reclaimed when needed. +3. **Fine-Grained Resource Quota Management:** Each queue can set parameters like: + - `capability`: Maximum resource capacity. + - `deserved`: Fair share of resources; excess can be reclaimed. + - `guarantee`: Reserved resources, ensuring minimum guarantees. +4. **Flexible Preemption Policies:** Supports priority-based preemption to ensure high-priority tasks receive resources promptly. + +#### **Hierarchical Queue Example** + +A simple hierarchical queue structure might look like this: + +{{
}} + +- **Root Queue:** Manages global resource allocation. +- **Department Queues:** Represent resource pools for different departments or teams. +- **Child Queues:** Represent specific projects or tasks, where users submit jobs. + +#### **Use Cases** + +- **Multi-Department Resource Sharing:** Large enterprises can use hierarchical queues to fairly allocate and isolate resources across departments. +- **Big Data Task Scheduling:** Users migrating from YARN to Kubernetes can leverage hierarchical queues for seamless Big Data workload migration. +- **AI Training and Inference:** Hierarchical queues enable dynamic resource allocation and reclamation for AI tasks. + +For detailed design and user guide, please refer to: + +Design Document: **[hierarchical-queue-on-capacity-plugin](https://github.com/volcano-sh/volcano/blob/master/docs/design/hierarchical-queue-on-capacity-plugin.md)**. + +Usage Document: **[Hierarchical Queue | Volcano](https://volcano.sh/zh/docs/hierarchical_queue/)**. + +Sincerely thanks to community developer: **@Rui-Gan** for this contribution! + +### **Multi-Cluster AI Job Scheduling: Unified Management and Efficient Scheduling Across Clusters** + +As enterprise workloads grow, single Kubernetes clusters often fall short of meeting the demands of large-scale AI training and inference jobs. Users typically manage multiple Kubernetes clusters to achieve unified workload distribution, deployment, and management. Many users already deploy Volcano across multiple clusters, managed by **[Karmada](https://karmada.io/)**. To better support AI jobs in multi-cluster environments, Volcano has incubated the **[Volcano Global](https://github.com/volcano-sh/volcano-global)** sub-project, extending Volcano’s powerful scheduling capabilities to multi-cluster scenarios. This project provides a unified scheduling platform for multi-cluster AI jobs, supporting cross-cluster job distribution, resource management, and priority control. + +#### **Core Capabilities** + +Volcano Global enhances Karmada with the following features to meet the complex demands of multi-cluster AI job scheduling: + +1. **Cross-Cluster Volcano Job Scheduling:** Users can deploy and schedule Volcano Jobs across multiple clusters, maximizing resource utilization. +2. **Queue Priority Scheduling:** Supports cross-cluster queue priority management, ensuring high-priority queues receive resources first. +3. **Job Priority Scheduling and Queuing:** Enables job-level priority scheduling and queuing across clusters, ensuring critical tasks are executed promptly. +4. **Multi-Tenant Fair Scheduling:** Provides fair resource allocation across tenants, preventing resource contention. + +
{{
}} +
+ +For detailed deployment and user guide, please refer to: **[Multi-Cluster AI Job Scheduling | Volcano](https://volcano.sh/en/docs/multi_cluster_scheduling/)**. + +Sincerely thanks to community developers: **@Vacant2333, @MondayCha, @lowang-bh, @Monokaix** for their contributions! + +### **Online and Offline Workloads colocation with Dynamic Resource Oversubscription: Maximizing Resource Utilization While Ensuring SLO** + +#### **Background: The Challenge of Resource Utilization** + +As cloud-native technologies advance, Kubernetes has become the "operating system" of the cloud-native era, with more workloads migrating to Kubernetes platforms. However, despite the flexibility and scalability of cloud-native technologies, data center resource utilization remains low. Online workloads (e.g., microservices) often exhibit peak-and-trough patterns, leaving resources idle during troughs and insufficient during peaks. To improve resource utilization while ensuring high-priority workload **SLOs (Service Level Objectives)**, Volcano introduces a **cloud-native colocation solution**, combining online and offline workloads with dynamic resource oversubscription to maximize cluster resource utilization while ensuring online workload stability. + +**Cloud-native colocation** involves deploying **online workloads** (e.g., real-time services) and **offline workloads** (e.g., batch jobs) on the same cluster. During online workload troughs, offline workloads utilize idle resources; during peaks, offline workloads are throttled to ensure online workload resource needs. This dynamic resource allocation mechanism not only improves resource utilization but also ensures online workload quality of service. + +#### **Industry Practices: Volcano’s Unique Advantages** + +While many companies have explored colocation technologies, existing solutions often fall short, such as being tightly coupled with Kubernetes, using rough oversubscription calculations, or offering inconsistent user experiences. Volcano addresses these issues with the following unique advantages: + +- **Native Support for Offline Job Scheduling:** Volcano Scheduler natively supports offline job scheduling without additional adaptation. +- **Non-Invasive Design:** No modifications to Kubernetes are required, allowing users to adopt Volcano without altering existing cluster architectures. +- **Dynamic Resource Oversubscription:** Real-time calculation of resources can be oversold ensures a balance between resource utilization and QoS. +- **OS-Level Isolation and Guarantees:** Kernel-level resource isolation ensures online workload priority and stability. + +#### **Volcano Cloud-Native Colocation Solution: End-to-End Resource Optimization** + +Volcano’s cloud-native colocation solution provides end-to-end resource isolation and sharing mechanisms, including the following core components: + +**Volcano Scheduler:** Manages unified scheduling of online and offline workloads, offering abstractions like queues, groups, job priorities, fair scheduling, and resource reservations to meet the needs of microservices, Big Data, and AI workloads. + +**Volcano SLO Agent:** A daemonset running on each node, the SLO Agent monitors node resource usage, dynamically calculates resources that can be oversold, and allocates them to offline workloads. It also detects CPU/memory pressure and evicts offline workloads when necessary to ensure online workload priority. + +**Enhanced OS:** Volcano implements fine-grained QoS guarantees at the kernel level, using cgroups to set resource limits for online and offline workloads, ensuring online workloads receive sufficient resources even under high load. + +
{{
}} +Architecture +
+ +#### **Core Capabilities: Balancing Resource Utilization and Stability** + +Volcano’s cloud-native colocation solution offers the following key capabilities to achieve both resource utilization and workload stability: + +- **Unified Scheduling:** Supports unified scheduling of microservices, batch and AI jobs. +- **QoS-Based Resource Model:** Provides QoS-based resource management for online and offline workloads, ensuring high-priority workload stability. +- **Dynamic Resource Oversubscription:** Dynamically calculates oversellable resources based on real-time CPU/memory usage, maximizing resource utilization. +- **CPU Burst:** Allows containers to temporarily exceed CPU limits, avoiding throttling during critical moments and improving responsiveness. +- **Network Bandwidth Isolation:** Supports node-level network bandwidth limits, ensuring online workload network requirements. + +For detailed design and user guide, please refer to: **[Cloud Native Colocation | Volcano](https://volcano.sh/en/docs/colocation/)**. + +Sincerely thanks to community developer: **@william-wang** for this contribution! + +### **Load-Aware Descheduling: Intelligent Cluster Resource Balancing** + +In Kubernetes clusters, dynamic workload changes often lead to uneven node resource utilization, causing hotspots that impact cluster stability and efficiency. To address this, Volcano v1.11 introduces **Load-Aware Descheduling**, dynamically adjusting Pod distribution based on real node load to ensure balanced resource utilization and avoid hotspots, improving overall cluster performance and reliability. Load-aware descheduling is incubated in the sub-project: https://github.com/volcano-sh/descheduler. + +#### **Core Capabilities:** + +- **Load-Aware Scheduling:** Monitors real CPU and memory load metrics to dynamically adjust Pod distribution, avoiding reliance on Pod Request-based scheduling. +- **Timed and Dynamic Triggers:** Supports CronTab-based or fixed-interval descheduling to adapt to different scenarios. + +#### **Use Cases:** + +- **Uneven Node Resource Utilization:** Balances node load when some nodes are overutilized while others are underutilized. +- **Hotspot Node Management:** Migrates Pods from overloaded nodes to ensure stability. + +
{{
}} +
+ +#### **Technical Highlights:** + +- **Descheduling Based on Actual Load:** + + Unlike traditional scheduling strategies based on Pod Requests, Volcano's load-aware descheduling is more precise, accurately reflecting the actual resource usage of nodes. + +- **Seamless Integration with Kubernetes Ecosystem:** + + Compatible with the native Kubernetes scheduler, enabling load-aware descheduling without requiring additional configurations. + +- **Flexible Policy Configuration:** + + Users can customize descheduling intervals or trigger conditions based on business requirements , ensuring flexibility and controllability in scheduling. + +For detailed user guide, please refer to: **[Load-aware Descheduling | Volcano](https://volcano.sh/en/docs/descheduler/)**. + +Sincerely thanks to community developer: **@Monokaix** for this contribution! + +### **Fine-Grained Job Fault Recovery: Efficient Task Interruption Handling** + +In AI, Big Data, and HPC scenarios, job stability and fault recovery are critical. Traditional fault recovery strategies often restart entire Jobs when a single Pod fails, wasting resources and potentially restarting training from scratch. With the rise of **checkpointing** and **resume-from-checkpoint** techniques, single Pod failures no longer require full Job restarts. Volcano v1.11 introduces **Fine-Grained Job Fault Recovery** feature, offering flexible fault handling mechanisms to efficiently manage task interruptions and improve training efficiency. + +#### **Core Capabilities:** + +##### Supporting Pod-Granular Restart Policies + +Users can configure policies to restart only failed Pods or their associated Tasks, avoiding unnecessary Job restarts and reducing resource waste. + +- **Restarting a Single Pod:** + When a specific Pod fails, only that Pod is restarted, leaving other running tasks unaffected. + ```yaml + policies: + - event: PodFailed + action: RestartPod + ``` + +- **Restarting an Entire Task:** + When a Pod fails, the entire Task (a group of Pods) to which it belongs is restarted. This is suitable for scenarios requiring consistency within a task group. + ```yaml + policies: + - event: PodFailed + action: RestartTask + ``` + +##### Support for Setting Timeouts for Actions + +Pod failures may be caused by transient issues (e.g., network fluctuations or hardware problems). Volcano allows users to set timeout periods for failure recovery actions. If the Pod recovers within the timeout period, no restart is performed, avoiding unnecessary intervention. + +- **Example Configuration:** +If a Pod fails and is restarted but does not recover within 10 minutes, the entire Job is restarted. + +```yaml +policies: + - event: PodFailed + action: RestartPod + - event: PodEvicted + action: RestartJob + timeout: 10m +``` + +##### New PodPending Event Handling + +When a Pod remains in the Pending state for an extended period due to insufficient resources or topological constraints, users can set a timeout for the Pending event. If the Pod does not start running after the timeout, the entire Job can be terminated to avoid resource waste. + +- **Example Configuration:** +If a Pod remains in the Pending state for more than 10 minutes, the Job will be terminated. + +```yaml +policies: + - event: PodPending + action: TerminateJob + timeout: 10m +``` + +#### **Applicable Scenarios:** + +- **AI Large Model Training:** + In distributed training, the failure of a single Pod does not affect the overall training progress. Fine-grained failure recovery strategies enable quick task recovery, avoiding the need to restart training from scratch. + +- **Big Data Processing:** + In batch processing tasks, failures of individual tasks can be resolved by restarting a single Pod or Task, eliminating the need to restart the entire Job and improving processing efficiency. + +- **High-Performance Computing (HPC):** + In HPC scenarios, task stability and efficient recovery are critical. Fine-grained failure recovery strategies minimize task interruption time. + +#### **Technical Highlights:** + +- **Flexible Policy Configuration:** + Users can customize failure recovery policies based on business requirements, supporting Pod, Task, and Job-level restart operations. + +- **Timeout Mechanism:** + By setting timeout periods, unnecessary restarts due to transient issues are avoided, enhancing Job stability. + +- **Seamless Compatibility with Checkpointing:** + Perfectly integrates with checkpointing and resumption technologies in AI scenarios, ensuring efficient recovery of training tasks. + +For detailed design and user guide, please refer to: **[How to use job policy](https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_job_policy.md)**. + +Sincerely thanks to community developer: **@bibibox** for this contribution! + +### **Volcano Dashboard: A Resource Visualization Component** + +The Volcano Dashboard is an official resource visualization component for Volcano. After deploying Volcano, users can deploy the dashboard to view and manage cluster resources through a graphical interface. The project is available at: https://github.com/volcano-sh/dashboard. + +Current features include: + +- Cluster overview, including Job counts, statuses, completion rates, Queue counts, and resource utilization. +- Job and Queue lists with filtering, sorting, and search capabilities. +- Pod lists with filtering, sorting, and search capabilities. + +Sincerely thanks to community developers: **@WY-Dev0, @Monokaix** for their contributions! + +### **Volcano Supports Kubernetes v1.31** + +Volcano closely follows Kubernetes releases, with full support for Kubernetes v1.31, including comprehensive UT and E2E testing to ensure functionality and reliability. + +To contribute to Volcano’s Kubernetes version adaptation, please refer to: **[adapt-k8s-todo](https://github.com/volcano-sh/volcano/blob/master/docs/design/adapt-k8s-todo.md)**. + +Sincerely thanks to community developers: **@vie-serendipity, @dongjiang1989** for their contributions! + +### **Volcano Job Supports Preemption Policy** + +Volcano Jobs now support **PreemptionPolicy**, allowing users to configure whether Jobs can preempt other Pods. Jobs with `PreemptionPolicy: Never` will not preempt resources, ensuring stability. + +For configuration examples, please refer to: **[how to configure priorityclass for job](https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_configure_priorityclass_for_job.md)**. + +Sincerely thanks to community developer: **@JesseStut** for this contribution! + +### **Performance Optimization: Efficient Scheduling at Scale** + +In Volcano, **Queue** is one of the most fundamental and critical resources. The `status` field of a Queue records the states of PodGroups, such as `Unknown`, `Pending`, `Running`, `Inqueue`, and `Completed`. However, in large-scale scenarios, frequent changes in PodGroups within a Queue (e.g., when a large number of short-lived tasks are repeatedly submitted) can cause many PodGroups to transition from `Running` to `Completed`. In such cases, the Volcano Controller needs to frequently refresh the `status` field of the Queue, placing significant pressure on the APIServer. Additionally, the Volcano Scheduler updates the `status.allocated` field of the Queue after Job scheduling, which can lead to Queue update conflicts in large-scale environments, further impacting system performance. + +To completely resolve the issues of frequent Queue refreshes and update conflicts in large-scale scenarios, Volcano v1.11 has optimized the Queue management mechanism by migrating PodGroup statistics to **Metrics**, eliminating the need for persistent storage. This optimization significantly reduces the pressure on the APIServer while improving the overall performance and stability of the system. + +#### **Key Improvements After Optimization** + +**Migration of PodGroup Statistics to Metrics** +PodGroup state data (e.g., `Unknown`, `Pending`, `Running`) is no longer stored in the `status` field of the Queue. Instead, it is recorded and displayed through the metrics system. Users can view the statistics of PodGroups in a Queue using the following commands: + +- **View statistics for a specific Queue**: + + ```bash + vcctl queue get -n [name] + ``` + +- **View statistics for all Queues**: + + ```bash + vcctl queue list + ``` + +**Reduced APIServer Pressure** +By migrating PodGroup statistics to Metrics, frequent updates to the `status` field of the Queue are avoided, significantly reducing the load on the APIServer and improving system throughput. + +**Resolved Queue Update Conflicts** +In large-scale scenarios, Queue update conflicts have been effectively mitigated, ensuring the efficient operation of the scheduler. + +For detailed design and metric names related to the migration of PodGroup state statistics to Metrics, please refer to: **[Queue podgroup statistics](https://github.com/volcano-sh/volcano/blob/master/docs/design/podgroup-statistics.md)**. + +Sincerely thanks to community developer: **@JesseStutler** for this contribution! + +## **Conclusion: Volcano v1.11, A New Era of Cloud-Native Batch Computing** + +Volcano v1.11 is not just a technological leap but a new chapter in cloud-native batch computing. Whether for AI large model training, Big Data scheduling, or resource optimization, Volcano v1.11 delivers powerful features and flexible solutions. We believe Volcano v1.11 will help users achieve greater heights in cloud-native batch computing, ushering in a new era of AI and Big Data scheduling! + +**Experience Volcano v1.11.0 now and step into a new era of efficient computing!** + +**v1.11.0 release:** https://github.com/volcano-sh/volcano/releases/tag/v1.11.0 + +## **Acknowledgments** + +Volcano v1.11.0 includes contributions from 39 community members. Sincerely thanks to all contributors: + +| @QingyaFan | @JesseStutler | @bogo-y | +| :------------- | :-------------- | :--------------- | +| @bibibox | @zedongh | @archlitchi | +| @dongjiang1989 | @william-wang | @fengruotj | +| @SataQiu | @lowang-bh | @Rui-Gan | +| @xovoxy | @wangyang0616 | @PigNatovsky | +| @Yanping-io | @lishangyuzi | @hwdef | +| @bood | @kerthcet | @WY-Dev0 | +| @raravena80 | @SherlockShemol | @zhifanggao | +| @conghuhu | @MondayCha | @vie-serendipity | +| @Prepmachine4 | @Monokaix | @lengrongfu | +| @jasondrogba | @sceneryback | @TymonLee | +| @liuyuanchun11 | @Vacant2333 | @matbme | +| @lekaf974 | @kursataktas | @lut777 | \ No newline at end of file diff --git a/website-docusaurus/blog/2025-04-01-how-volcano-boosts-distributed-training-and-inference-performance.md b/website-docusaurus/blog/2025-04-01-how-volcano-boosts-distributed-training-and-inference-performance.md new file mode 100644 index 00000000..423ffa1e --- /dev/null +++ b/website-docusaurus/blog/2025-04-01-how-volcano-boosts-distributed-training-and-inference-performance.md @@ -0,0 +1,58 @@ +--- +title: "How volcano boosts distributed training and inference performance" +description: "Join Volcano at KubeCon + CloudNativeCon Europe, 1-4 April 2025 in London!" +date: 2025-04-01 +authors: [volcano] +--- +## The Growing Demand for LLM Workloads and Associated Challenges + +The increasing adoption of large language models (LLMs) has led to heightened demand for efficient AI training and inference workloads. As model size and complexity grow, distributed training and inference have become essential. However, this expansion introduces challenges in network communication, resource allocation, and fault recovery within large-scale distributed environments. These issues often create performance bottlenecks that hinder scalability. + +## Addressing Network Bottlenecks Through Topology-Aware Scheduling + +In LLM training, model parallelism distributes workloads across multiple nodes, requiring frequent data exchanges. Network communication can become a bottleneck, particularly in heterogeneous environments with InfiniBand (IB), RoCE, or NVSwitch configurations. Communication efficiency depends on network topology—fewer switches between nodes typically result in lower latency and higher throughput. +One approach to mitigating this challenge is Network Topology-Aware Scheduling, which optimizes workload placement to minimize cross-switch communication. A key component of this strategy is the HyperNode, an abstraction for representing network topology via Custom Resource Definitions (CRDs). Unlike label-based methods, HyperNode provides a hierarchical structure that reflects actual network layouts, improving management and optimization. Nodes within the same HyperNode communicate more efficiently than those spanning multiple layers. + +
{{
}} +
+ +Topology constraints can also be specified for jobs through the networkTopology field, with options for strict (Hard Mode) or flexible (Soft Mode) enforcement. This granular control helps ensure workloads are deployed in optimal network environments, reducing latency and improving throughput. + +## Managing Multi-Cluster Environments for Scalability + +As AI workloads expand, single Kubernetes clusters may no longer suffice for large-scale training and inference. While multiple clusters can address this limitation, managing them efficiently presents challenges. +The Volcano Global subproject extends scheduling capabilities to multi-cluster environments, integrating with Karmada to enable cross-cluster scheduling for distributed workloads. Features such as Queue Priority Scheduling, Job Priority Scheduling, and Multi-Tenant Fair Scheduling help optimize resource allocation and ensure equitable access across tenants. This approach simplifies multi-cluster management while supporting scalable AI workloads. + +
{{
}} +
+ +## Improving Stability with Fine-Grained Fault Recovery + +Fault recovery is critical in distributed AI training and inference. Traditional methods often restart entire jobs upon a single Pod failure, leading to resource inefficiencies. With checkpointing and resume-from-checkpoint techniques, full restarts are often unnecessary. +Fine-Grained Job Fault Recovery allows policies to restart only failed Pods or associated tasks, reducing unnecessary disruptions. Timeout configurations can further minimize interventions—if a Pod recovers within the allotted time, no restart is triggered. This approach enhances stability and efficiency in distributed workloads. + +## Future Developments in Distributed Workload Management + +Ongoing advancements in distributed workload management include: +- Task-Level Network Topology Affinity Scheduling: Support for distributed inference scenarios, such as integration with lws. + +- HyperNode Auto-Discovery and Status Updates: Automation for HyperNode lifecycle management. + +- Dynamic Resource Allocation (DRA): Improved management of heterogeneous resources. + +- Dynamic GPU Partitioning: Support for MIG and MPS to enhance GPU utilization. + +More information for Volcano: +- Website: https://volcano.sh/ + +- GitHub: https://github.com/volcano-sh/volcano + +- Slack: Join the conversation onVolcano Slack. + +- Weekly Meetings: Attend our weekly meetings and review meeting notes: + + - Meeting Link: [Zoom](https://zoom.us/j/91804791393) + + - Meeting Notes: [Google Docs](https://docs.google.com/document/d/1YLbF8zjZBiR9PbXQPB22iuc_L0Oui5A1lddVfRnZrqs/edit?tab=t.0#heading=h.u99fvvct3m1z) + +- Twitter: Follow us on [X (formerly Twitter)](https://x.com/volcano_sh) for the latest updates. \ No newline at end of file diff --git a/website-docusaurus/blog/2025-05-30-volcano-2025-security-audit.md b/website-docusaurus/blog/2025-05-30-volcano-2025-security-audit.md new file mode 100644 index 00000000..3f561296 --- /dev/null +++ b/website-docusaurus/blog/2025-05-30-volcano-2025-security-audit.md @@ -0,0 +1,45 @@ +--- +title: "Volcano completes security audit" +description: "Volcano completes 2025 security audit" +date: 2025-05-30 +authors: [volcano] +--- +Volcano is excited to announce the completion of our CNCF-funded security audit carried out by [Ada Logics](https://adalogics.com/) and facilitated by [OSTIF](https://ostif.org/) in collaboration with the Volcano maintainers. The audit was scoped to cover the Volcano source code, supply-chain risks and fuzzing. The auditing team identified 10 security issues which the Volcano security team has fixed with the completion of the audit. + +Volcano has addressed several infrastructure-level security issues by making targeted configuration changes that reduce risk and improve the default security posture of its default deployment. Below is a breakdown of each issue, the associated risks, and how Volcano resolved them, along with the resulting security improvements. + +One issue involved several Volcano components running with root privileges by default. Containers running as root pose an increased security risk in that if compromised, an attacker gains access to capabilities they can use to escalate their privileges. Volcano fixed this by configuring all components - including the scheduler, admission controller, controllers, and dashboard - to run as non-root by default. This change limits the scope of what an attacker can do inside a container and helps contain breaches more effectively. + +Another issue was the absence of seccomp profiles across Volcano’s workloads. Without seccomp, containers can invoke any Linux system call which increases the attack surface for kernel-level attacks and container escapes. Volcano addressed this by adding seccomp profiles, specifically using `RuntimeDefault`, which restricts containers to a safe subset of system calls. This reduces the kernel’s exposure and strengthens runtime isolation. + +Volcano also lacked SELinux in its containers. SELinux manages access control at the kernel level and limits how processes can interact with files, system resources, and other processes. Volcano added `SELinux` to all its pods and containers. + +In addition, Volcano had previously assigned containers with unnecessary Linux capabilities—fine-grained permissions that determine what a containerized process can do. For example, capabilities like `CAP_NET_ADMIN` or `CAP_SYS_ADMIN` grant significant power and are often unnecessary for typical application logic. Volcano mitigated this risk by removing non-essential capabilities using a “drop all” approach and only adding back specific permissions if needed. This reduces the attack surface and enforces the principle of least privilege. + +Prior to the audit, Volcano allowed containers to escalate privileges during execution, which could permit non-privileged processes to gain additional privileges. Such privilege escalation increases the risk of bypassing container security controls. Volcano resolved this by setting `allowPrivilegeEscalation: false` in its containers and pods ensuring that processes run only with the privileges they were initially assigned. + +These changes help contain potential attacks, reduce the avenues for privilege escalation or container breakout, and enhance the overall resilience of the system in multi-tenant and production environments. + +On the application side, the auditors identified 5 issues, of which the most interesting was an issue where an attacker who had compromised an elastic service or an extender plugin in the cluster could cause denial of service of the Volcano scheduler. This issue was assigned CVE-2025-32777 of HIGH severity. + +## Fuzzing + +During the audit, Ada Logics integrated volcano into [Googles OSS-Fuzz project](https://github.com/google/oss-fuzz/tree/master/projects/volcano) with two initial fuzz tests. OSS-Fuzz is an open source project that other critical open source projects can integrate into. Google runs integrated projects’ fuzzers on vast amounts of compute and reports any findings to the projects team via email. OSS-Fuzz’s reports contain information such as stack traces, steps to reproduce, which fuzz harness found the issue and more. Periodically, OSS-Fuzz reproduces the issue to assert that it still exists. If it can’t reproduce it, OSS-Fuzz automatically marks the issue fixed. + +## Getting involved in Volcano + +Volcano is the industry's first cloud-native batch computing engine and the sole batch computing project within the CNCF. It operates as a Kubernetes-native batch scheduling system, enhancing the standard kube-scheduler. Volcano provides comprehensive features to manage and optimize diverse batch and elastic workloads, including AI/ML/DL, Bioinformatics/Genomics, and other "Big Data" applications. It offers robust integration with frameworks such as Spark, Flink, Ray, TensorFlow, PyTorch, Argo, MindSpore, PaddlePaddle, Kubeflow, MPI, Horovod, MXNet, and KubeGene. Drawing from over fifteen years of experience in high-performance workload operations, Volcano combines proven practices and innovative concepts to deliver a powerful and flexible scheduling solution. + +We encourage you to join our community and contribute to Volcano's development. Your participation is valuable, whether you're asking questions, sharing experiences, or contributing code. + +- GitHub: Access our main repository to contribute code or report issues: [https://github.com/volcano-sh/volcano](https://github.com/volcano-sh/volcano). +- Website & Documentation: Find comprehensive documentation and news on our official website: [https://volcano.sh/en/](https://volcano.sh/en/). +- Contributing Code: Our [Contribution Guide](https://github.com/volcano-sh/volcano/blob/master/contribute.md) offers detailed instructions on finding good first issues and submitting pull requests. We welcome contributions of all sizes. +- Slack Channel: Join our Slack workspace for real-time discussions and support. First, join the CNCF Slack at [https://slack.cncf.io/](https://slack.cncf.io/), then navigate to the #volcano channel: [https://cloud-native.slack.com/archives/C011GJDQS0N](https://www.google.com/search?q=https://cloud-native.slack.com/archives/C011GJDQS0N). +- Community Meetings: Participate in our regular community meetings to discuss project updates, roadmaps, and proposals. + - [Meeting Link](https://zoom.us/j/91804791393) + - [Meeting Notes](https://docs.google.com/document/d/1YLbF8zjZBiR9PbXQPB22iuc_L0Oui5A1lddVfRnZrqs/edit) +- Mailing List: Subscribe to our [mailing list](https://groups.google.com/forum/#!forum/volcano-sh) for important announcements and broader discussions. + +You can find the audit report [here](https://volcano.sh/reports/Ada-Logics-Volcano-Security-Audit-2025.pdf). +We would like to thank all involved parties in the audit for their great work. \ No newline at end of file diff --git a/website-docusaurus/blog/2025-06-12-volcano-1.12.0-release.md b/website-docusaurus/blog/2025-06-12-volcano-1.12.0-release.md new file mode 100644 index 00000000..0b9d4206 --- /dev/null +++ b/website-docusaurus/blog/2025-06-12-volcano-1.12.0-release.md @@ -0,0 +1,293 @@ +--- +title: "Volcano v1.12.0 Available Now" +description: "New features: Network Topology-Aware Scheduling (Alpha), Dynamic MIG Partitioning for GPU Virtualization, DRA Support, Queue Capacity Management in Volcano Global, Security Enhancements, Performance Optimizations, Gang Scheduling for Generic Workloads, Job Flow Improvements, and Kubernetes v1.32 Support." +date: 2025-06-12 +authors: [volcano] +--- +## Volcano v1.12 released: Advancing Cloud-Native AI and Batch Computing + +As AI large model technology rapidly evolves, enterprises are placing higher demands on computing resource efficiency and application performance. For complex application scenarios such as AI, big data, and high-performance computing (HPC), efficiently utilizing accelerators like GPUs, ensuring high system availability, and managing resources with fine granularity are the core areas of focus for the Volcano community's continuous innovation. + +Each version of Volcano is an active response to these challenges. With contributions from **over 1,000 developers from more than 30 countries, resulting in nearly 40,000 contributions**, Volcano has been adopted in production environments by more than 60 enterprises worldwide. Its scheduling performance and resource management capabilities have been widely proven in practice. + +Today, the **Volcano community officially releases v1.12.** This new version focuses on the core requirements of modern AI and big data scenarios, and introduces a series of key features and usability improvements: + +### **Highlights of v1.12** + +* **Network Topology-Aware Scheduling (Alpha):** Optimizes the deployment of large-scale AI training and inference tasks by using network topology awareness to reduce cross-switch communication and improve runtime efficiency. +* **Enhanced GPU Virtualization:** Adds support for NVIDIA GPU dynamic MIG partitioning besides the existing vCUDA solution. This provides users with both software and hardware virtualization options for more flexible and efficient GPU resource sharing. +* **DRA Support:** Enhances the flexibility and capabilities of heterogeneous resource management. +* **Queue Capacity Management in Volcano Global:** Supports unified limits and management of resource quotas (capabilities) for tenant queues in a multi-cluster environment. +* **Comprehensive Security Enhancements:** Implements multi-dimensional security hardening, from API access control to container runtime permissions, to improve system robustness. +* **Performance Optimization for Large-Scale Scenarios:** Improves concurrent task processing efficiency by reducing unnecessary webhook calls. +* **Enhanced Gang Scheduling for Generic Workloads:** Adds support for custom minimum member counts (`minMember`) for Gang scheduling of generic workloads like Deployments and StatefulSets via annotations, providing more fine-grained Gang Scheduling strategies. +* **Job Flow Enhancements:** Improves the robustness and observability of the built-in workflow orchestration engine. +* And many other stability and usability improvements. + +We believe these updates in v1.12 will further enhance intelligent task scheduling, resource utilization, and overall system performance, helping users to better meet the challenges of the AI and big data era. + +## Core Feature Details + +### Network Topology-Aware Scheduling (Alpha Release) + +Previously a preview feature in v1.11, Volcano's Network Topology-Aware Scheduling is now an Alpha release in v1.12. This feature is designed to optimize the deployment of AI tasks in large-scale training and inference scenarios (e.g., model-parallel training, leader-worker inference). By scheduling tasks within the same network topology performance domain, it reduces cross-switch communication, thereby significantly improving task efficiency. Volcano uses the HyperNode CRD to abstract and represent heterogeneous hardware network topologies and supports a hierarchical structure for easier management. + +Version 1.12 integrates the following key features: + +* **HyperNode Auto-Discovery:** Volcano can now automatically discover the cluster's network topology. Users can configure the discovery type, and the system will automatically create and maintain hierarchical HyperNodes that reflect the cluster's actual network topology. It currently supports obtaining topology information from InfiniBand (IB) networks via the UFM (Unified Fabric Manager) interface to automatically update HyperNodes. Support for more network protocols like RoCE is planned for the future. +* **Prioritized HyperNode Selection:** This version introduces a scoring strategy based on a combination of node-level and HyperNode-level scores to determine the final priority of a HyperNode. + * **Node-level:** It is recommended to configure the BinPack plugin to pack nodes within a HyperNode first, reducing resource fragmentation. + * **HyperNode-level:** Lower-level HyperNodes are prioritized for better performance, as they involve fewer cross-switch traversals. For HyperNodes at the same level, those containing more tasks receive a higher score to reduce HyperNode-level resource fragmentation. +* **Node Matching with Label Selectors:** HyperNode leaf nodes are associated with physical nodes in the cluster and support the following three matching strategies: + * **Exact Match:** Directly matches node names. + * **Regex Match:** Matches node names using regular expressions. + * **Label Match:** Matches nodes using standard Label Selectors. + +Related documentation: + +* [Network Topology-Aware Scheduling Introduction and Usage](https://volcano.sh/en/docs/network_topology_aware_scheduling/) +* [Network Topology-Aware Scheduling Design Document](https://github.com/volcano-sh/volcano/blob/master/docs/design/Network%20Topology%20Aware%20Scheduling.md) +* [HyperNode Auto-Discovery Design Document](https://github.com/volcano-sh/volcano/blob/master/docs/design/hyperNode-auto-discovery.md) +* [HyperNode Auto-Discovery Usage Guide](https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_hypernode_auto_discovery.md) + +Related PRs: + +* [https://github.com/volcano-sh/volcano/pull/3874](https://github.com/volcano-sh/volcano/pull/3874) +* [https://github.com/volcano-sh/volcano/pull/3894](https://github.com/volcano-sh/volcano/pull/3894) +* [https://github.com/volcano-sh/volcano/pull/3969](https://github.com/volcano-sh/volcano/pull/3969) +* [https://github.com/volcano-sh/volcano/pull/3971](https://github.com/volcano-sh/volcano/pull/3971) +* [https://github.com/volcano-sh/volcano/pull/4068](https://github.com/volcano-sh/volcano/pull/4068) +* [https://github.com/volcano-sh/volcano/pull/4213](https://github.com/volcano-sh/volcano/pull/4213) +* [https://github.com/volcano-sh/volcano/pull/3897](https://github.com/volcano-sh/volcano/pull/3897) +* [https://github.com/volcano-sh/volcano/pull/3887](https://github.com/volcano-sh/volcano/pull/3887) + +Thanks to the following community developers for their contributions to this feature: **@ecosysbin, @weapons97, @Xu-Wentao, @penggu, @JesseStutler, @Monokaix**. + +### Dynamic MIG Partitioning for GPU Virtualization + +Volcano's GPU virtualization feature allows users to request partial GPU resources based on memory and compute power. It works with a Device Plugin to achieve hardware isolation and improve GPU utilization. While traditional GPU virtualization limits GPU usage by intercepting CUDA APIs, the MIG (Multi-Instance GPU) technology in the NVIDIA Ampere architecture allows a single physical GPU to be partitioned into multiple independent instances. However, typical MIG solutions often use pre-configured, fixed-size instances, which can lead to resource waste and inflexibility. + +**Volcano v1.12 introduces dynamic MIG partitioning and scheduling capabilities.** It can select the appropriate MIG instance size in real-time based on the user's requested GPU amount and uses a Best-Fit algorithm to reduce resource waste. It also supports GPU scoring strategies like BinPack and Spread to minimize resource fragmentation and improve GPU utilization. Users can request resources using the unified APIs `volcano.sh/vgpu-number`, `volcano.sh/vgpu-cores`, and `volcano.sh/vgpu-memory`, without needing to be aware of the underlying implementation details. + +* Design Document: [Dynamic MIG Design Document](https://github.com/volcano-sh/volcano/blob/master/docs/design/dynamic-mig.md) +* Usage Guide: [Dynamic MIG Usage Guide](https://volcano.sh/en/docs/gpu_virtualization/) + +Related PRs: + +* [https://github.com/volcano-sh/volcano/pull/4290](https://github.com/volcano-sh/volcano/pull/4290) +* [https://github.com/volcano-sh/volcano/pull/3953](https://github.com/volcano-sh/volcano/pull/3953) + +Thanks to the following community developers for their contributions to this feature: **@sailorvii, @archlitchi**. + +### Support for Dynamic Resource Allocation (DRA) + +Kubernetes DRA (Dynamic Resource Allocation) is a native feature that provides a more flexible and powerful way to manage heterogeneous hardware resources in a cluster, such as GPUs, FPGAs, and high-performance network cards. It addresses the limitations of the traditional Device Plugin model in some advanced scenarios. Volcano v1.12 adds support for DRA, allowing the cluster to dynamically allocate and manage external resources, which enhances Volcano's integration with the Kubernetes ecosystem and improves the flexibility of resource management. + +* Usage Guide: [Enabling DRA in Volcano](https://volcano.sh/en/docs/unified_scheduling/#2-1-2-enable-dra-dynamic-resource-allocation-in-volcano) + +Related PR: + +* [https://github.com/volcano-sh/volcano/pull/3799](https://github.com/volcano-sh/volcano/pull/3799) + +Thanks to community developer **@JesseStutler** for their contribution to this feature. + +### Queue Capacity Management in Volcano Global + +Queues are a core concept in Volcano. To support quota management in multi-cluster and multi-tenant environments, Volcano v1.12 extends its global queue capacity management capabilities. Users can now uniformly limit tenant resource usage in a multi-cluster environment. The configuration is consistent with the single-cluster scenario: the `capability` field in the queue configuration is used to limit tenant quotas. + +Related PR: + +* [https://github.com/volcano-sh/volcano-global/pull/16](https://github.com/volcano-sh/volcano-global/pull/16) + +Thanks to community developer **@tanberBro** for their contribution to this feature. + +### Security Enhancements + +The Volcano community is committed to security. In v1.12, in addition to fine-grained control over sensitive permissions like ClusterRoles, the following security risks have been addressed and hardened: + +* **Set Timeouts for HTTP Servers:** The Metric and Healthz endpoints of all Volcano components now have server-side `ReadHeader`, `Read`, and `Write` timeouts to prevent prolonged resource occupation. (PR: [https://github.com/volcano-sh/volcano/pull/4208](https://github.com/volcano-sh/volcano/pull/4208)) +* **Add Warning for Skipping SSL Certificate Verification:** When a client request sets `insecureSkipVerify` to `true`, a warning is logged to recommend enabling SSL certificate verification in production environments. (PR: [https://github.com/volcano-sh/volcano/pull/4211](https://github.com/volcano-sh/volcano/pull/4211)) +* **Disable Volcano Scheduler's pprof Endpoint by Default:** To prevent the leakage of sensitive program information, the profiling data port used for troubleshooting is now disabled by default. (PR: [https://github.com/volcano-sh/volcano/pull/4173](https://github.com/volcano-sh/volcano/pull/4173)) +* **Remove Unnecessary File Permissions:** Unnecessary execute permissions have been removed from Go source files to follow the principle of least privilege. (PR: [https://github.com/volcano-sh/volcano/pull/4171](https://github.com/volcano-sh/volcano/pull/4171)) +* **Set Security Context for Containers and Run as Non-Root:** All Volcano components now run with non-root privileges. Security contexts have been added with `seccompProfile` and `SELinuxOptions`, and `allowPrivilegeEscalation` is set to `false` to prevent container privilege escalation. Only necessary Linux capabilities are retained, comprehensively restricting container permissions. (PR: [https://github.com/volcano-sh/volcano/pull/4207](https://github.com/volcano-sh/volcano/pull/4207)) +* **Limit HTTP Response Body Size:** For HTTP requests sent by the Extender Plugin and ElasticSearch Service, the response body size is limited to prevent issues like OOM caused by excessive resource consumption. (Advisory: [https://github.com/volcano-sh/volcano/security/advisories/GHSA-hg79-fw4p-25p8](https://github.com/volcano-sh/volcano/security/advisories/GHSA-hg79-fw4p-25p8)) + +### Performance Improvements for Large-Scale Scenarios + +Volcano's performance is continuously being optimized. The new version removes and disables some non-essential webhooks by default without affecting functionality, improving performance in large-scale batch creation scenarios: + +* **PodGroup Mutating Webhook Disabled by Default:** Previously, when a PodGroup was created without a specified queue, the queue could be populated from the Namespace. Since this scenario is uncommon, this webhook is now disabled by default. Users can enable it if needed. +* **Queue Status Check Moved from Pod to PodGroup:** Task submission is not allowed when a queue is in a closed state. The original validation logic was performed at the Pod creation stage. Since Volcano's basic scheduling unit is the PodGroup, moving the validation to the PodGroup creation stage is more efficient. As the number of PodGroups is less than the number of Pods, this change reduces webhook calls and improves performance. + +Related PRs: + +* [https://github.com/volcano-sh/volcano/pull/4128](https://github.com/volcano-sh/volcano/pull/4128) +* [https://github.com/volcano-sh/volcano/pull/4132](https://github.com/volcano-sh/volcano/pull/4132) + +Thanks to community developer **@Monokaix** for their contribution to this feature. + +### Gang Scheduling for Various Workload Types + +Gang scheduling is a core capability of Volcano. For Volcano Job and PodGroup objects, users can directly set `minMember` to define the required minimum number of replicas. In the new version, users can specify this minimum by setting the annotation `scheduling.volcano.sh/group-min-member` on other types of workloads such as Deployments, StatefulSets, and Jobs. This means that when using Volcano for scheduling, either the specified number of replicas are all scheduled successfully, or none are scheduled at all, enabling Gang scheduling for a wider variety of workload types. + +For example, to set `minMember=10` for a Deployment: + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: volcano-group-deployment + annotations: + # Set min member=10 + scheduling.volcano.sh/group-min-member: "10" +``` + +Related PR: + +* [https://github.com/volcano-sh/volcano/pull/4000](https://github.com/volcano-sh/volcano/pull/4000) + +Thanks to community developer **@sceneryback** for their contribution to this feature. + +### Job Flow Enhancements + +Job Flow is a lightweight workflow orchestration framework for Volcano Jobs. In v1.12, Job Flow has been enhanced with the following improvements: + +* New Monitoring Metrics: Added metrics for the number of successful and failed Job Flows. +* DAG Validity Check: Introduced a function to validate the structure of a Job Flow's Directed Acyclic Graph (DAG). +* State Synchronization Fix: Resolved an issue that caused inaccurate Job Flow state synchronization. + +Related PRs: + +* [https://github.com/volcano-sh/volcano/pull/4169](https://github.com/volcano-sh/volcano/pull/4169) +* [https://github.com/volcano-sh/volcano/pull/4090](https://github.com/volcano-sh/volcano/pull/4090) +* [https://github.com/volcano-sh/volcano/pull/4135](https://github.com/volcano-sh/volcano/pull/4135) + +Thanks to community developer **@dongjiang1989** for their contribution to this feature. + +### Finer-Grained Permission Control in Multi-Tenant Scenarios + +Volcano natively supports multi-tenant environments and emphasizes permission control in such scenarios. In the new version, Volcano has enhanced permission control for Volcano Jobs by adding read-only and read-write ClusterRoles. Users can assign different permissions to tenants as needed to achieve better isolation. + +Related PR: + +* [https://github.com/volcano-sh/volcano/pull/4174](https://github.com/volcano-sh/volcano/pull/4174) + +Thanks to community developer **@Hcryw** for their contribution to this feature. + +### Kubernetes 1.32 Support + +Volcano stays current with Kubernetes releases. Version 1.12 supports the latest Kubernetes v1.32 and ensures functionality and reliability through comprehensive unit and end-to-end (E2E) tests. + +To participate in Volcano's adaptation work for new Kubernetes versions, please refer to: [adapt-k8s-todo](https://github.com/volcano-sh/volcano/pull/4318). + +Related PR: + +* [https://github.com/volcano-sh/volcano/pull/4099](https://github.com/volcano-sh/volcano/pull/4099) + +Thanks to community developers **@guoqinwill** and **@danish9039** for their contributions to this feature. + +### Enhanced Queue Monitoring Metrics + +Volcano queues now include several new key resource metrics. The system now supports monitoring and visualization of `request`, `allocated`, `deserved`, `capacity`, and `real_capacity` for CPU, memory, and extended resources, providing a detailed view of the status of key queue resources. + +Related PR: + +* [https://github.com/volcano-sh/volcano/pull/3937](https://github.com/volcano-sh/volcano/pull/3937) + +Thanks to community developer **@zedongh** for their contribution to this feature. + +### Fuzz Testing Support + +Fuzz testing is an automated software testing technique. In this release, Volcano introduces a fuzz testing framework to test key function units. It uses Google's open-source OSS-Fuzz framework for continuous testing, which helps to discover potential vulnerabilities and defects early, enhancing the security and robustness of Volcano. + +Related PR: + +* [https://github.com/volcano-sh/volcano/pull/4205](https://github.com/volcano-sh/volcano/pull/4205) + +Thanks to community developer **@AdamKorcz** for their contribution to this feature. + +### Stability Enhancements + +This release includes several stability fixes, addressing issues such as panics caused by improper queue capacity settings, hierarchical queue validation failures, unnecessary PodGroup refreshes, and StatefulSets with zero replicas consuming queue resources. These improvements further enhance the stability of the system in complex scenarios. + +Related PRs: + +* [https://github.com/volcano-sh/volcano/pull/4273](https://github.com/volcano-sh/volcano/pull/4273) +* [https://github.com/volcano-sh/volcano/pull/4272](https://github.com/volcano-sh/volcano/pull/4272) +* [https://github.com/volcano-sh/volcano/pull/4179](https://github.com/volcano-sh/volcano/pull/4179) +* [https://github.com/volcano-sh/volcano/pull/4141](https://github.com/volcano-sh/volcano/pull/4141) +* [https://github.com/volcano-sh/volcano/pull/4033](https://github.com/volcano-sh/volcano/pull/4033) +* [https://github.com/volcano-sh/volcano/pull/4012](https://github.com/volcano-sh/volcano/pull/4012) +* [https://github.com/volcano-sh/volcano/pull/3603](https://github.com/volcano-sh/volcano/pull/3603) + +Thanks to the following community developers for their contributions: **@halcyon-r, @guoqinwill, @JackyTYang, @JesseStutler, @zhutong196, @Wang-Kai, @HalfBuddhist**. + +#### Pre-Upgrade Notes + +Before upgrading to Volcano v1.12, please note the following changes: + +* **PodGroup Mutating Webhook Disabled by Default:** In v1.12, the PodGroup's Mutating Webhook is disabled by default. If you have workflows that rely on the webhook to automatically populate a PodGroup's queue from its Namespace, you must manually enable this webhook after upgrading. +* **Queue Status Check Migration and Behavioral Change:** The queue status validation logic for task submission has been moved from the Pod creation stage to the PodGroup creation stage. Now, when a queue is closed, the system will prevent task submission at the PodGroup creation time. However, individual Pods (not submitted via a PodGroup) can still be submitted to a closed queue, but they will not be scheduled by the Volcano Scheduler. +* **Volcano Scheduler pprof Endpoint Disabled by Default:** For security reasons, the Volcano Scheduler's pprof endpoint is disabled by default in this version. If needed, it can be explicitly enabled via the Helm parameter `custom.scheduler_pprof_enable=true` or the command-line argument `--enable-pprof=true`. + +## Summary and Future Work + +The release of Volcano v1.12 is the result of the joint efforts of community contributors and users. This version brings enhancements to AI task scheduling, GPU resource utilization, heterogeneous resource management, security, and performance and stability in large-scale scenarios. + +Version 1.12 aims to improve the performance and efficiency of running AI, big data, and other batch computing tasks in cloud-native environments. We recommend that users upgrade to the new version and welcome feedback and suggestions for improvement through our community channels. + +In the future, the Volcano community will continue to focus on the core needs of CNAI, big data, and other fields, iterating continuously. + +## **Roadmap and Call for Contributions** + +The Volcano community is committed to building a more powerful, flexible, and user-friendly batch computing platform while actively responding to evolving technology trends and user needs. In upcoming releases, we plan to focus on the following areas: + +1. **Deepen Network Topology-Aware Scheduling Capabilities:** Building on the v1.12 Alpha version, we will continue to enhance our network topology-aware capabilities. Key areas include providing automatic discovery support for RoCE networks, intelligent identification and use of node labels, and moving towards more fine-grained, task-level topology-aware scheduling. We will also explore and implement more advanced scheduling features to meet the performance requirements of complex AI training scenarios. Related issues: + * [HyperNode based binpack scheduling policy needed](https://github.com/volcano-sh/volcano/issues/4331) + * [Support task level network topology constraint](https://github.com/volcano-sh/volcano/issues/4188) + * [Support identifying network topology from node labels and converting into hyperNode resources](https://github.com/volcano-sh/volcano/issues/4145) + * [Network-topology-aware scheduling optimization: node reordering for tasks](https://github.com/volcano-sh/volcano/issues/4233) +2. **Introduce Advanced Resource Management Mechanisms:** We will focus on developing and improving job rescheduling and resource reservation functions. This will help to more flexibly respond to dynamic changes in cluster load, ensure resource guarantees for critical tasks, and further improve overall cluster resource utilization. Related issue: + * [GPU fragmentation across nodes and Job/Pod rescheduling strategy request](https://github.com/volcano-sh/volcano/issues/3948) +3. **Enhance Queue Scheduling Flexibility:** We will provide fine-grained configuration for queue-level scheduling policies. This will allow users to customize scheduling behavior and resource allocation strategies based on the characteristics, priorities, and SLA requirements of different business queues. Related issue: + * [volcano supports queue-level scheduling policies](https://github.com/volcano-sh/volcano/issues/3992) +4. **Deepen Ecosystem Collaboration and Integration:** We will actively promote collaboration with the upstream Kubernetes community and other cloud-native projects, such as integrating LWS (Leader Worker Set) with Volcano to better provide Gang Scheduling capabilities for distributed applications. Related issue: + * [Support custom scheduler to enable gang scheduling](https://github.com/kubernetes-sigs/lws/issues/407) + We warmly welcome other open-source projects to join with Volcano to build and enrich the cloud-native batch computing ecosystem. +5. **Expand Heterogeneous Hardware Support and Cooperation:** We will strengthen cooperation with hardware ecosystem partners, such as adapting and optimizing Ascend's Device Plugin and DRA Driver, and collaborating with major GPU vendors on DRA Drivers. This will ensure that Volcano can efficiently and stably schedule and manage various cutting-edge heterogeneous accelerator resources. +6. **Improve JobFlow Workflow Capabilities:** We will continue to optimize Volcano's built-in lightweight workflow engine, JobFlow. Plans include enhancing its capabilities in complex job dependency management, status monitoring, error handling, and user-defined extensions to provide users with a more powerful and user-friendly workflow orchestration solution. Related issues: + * [Support JobFlowTemplate CRD](https://github.com/volcano-sh/volcano/issues/4098) + * [Enhance JobFlow Functionality](https://github.com/volcano-sh/volcano/issues/4275) + +7. **Introduce Volcano Scheduler Simulator to Enhance Scheduling Transparency and Testability:** To improve the transparency of the scheduling process and simplify testing, Volcano plans to introduce a scheduling simulator. This tool will allow users to accurately reproduce Volcano's core scheduling process in a lightweight environment by flexibly configuring a simulated cluster state (nodes, Pods, queue configurations, etc.). By outputting detailed scheduling logs and optional performance analysis, the simulator will make it easier for developers to test new features, help users understand and validate Volcano's scheduling behavior in different scenarios, and efficiently evaluate the impact of various scheduling policies. Related issue: + * [Implement Volcano Scheduler Simulator](https://github.com/volcano-sh/volcano/issues/4276) + +## **Community Engagement** + +The above roadmap is a preliminary plan. We welcome developers and users to participate in discussions and contribute ideas and suggestions for the future of Volcano. + +* **GitHub Issues:** Create a `kind/feature` issue in the Volcano GitHub repository, detailing your use case and feature expectations. +* **Community Communication:** Participate in community meetings, or start a discussion in the WeChat group, Slack channel, or mailing list to communicate with developers and community members. +* **Roadmap Contribution:** Feel free to make suggestions regarding our proposed roadmap or other features you consider important. + +## **Acknowledgments** + +Volcano v1.12 includes hundreds of code commits from 43 community contributors. We would like to express our sincere thanks to all of them for their contributions. Their GitHub IDs are listed below: + + + + + + + + + + + + + + + + + +
@AdamKorcz@HalfBuddhist@Hcryw
@JackyTYang@JesseStutler@MondayCha
@Monokaix@Poor12@SataQiu
@Wang-Kai@archlitchi@baddoub
@cnmcavoy@co63oc@de6p
@dongjiang1989@ecosysbin@fengruotj
@feyounger@fjq123123@googs1025
@guoqinwill@halcyon-r@hansongChina
@hiwangzhihui@hwdef@kingeasternsun
@linuxfhy@mahdikhashan@mahmut-Abi
@murali1539@ouyangshengjia@qGentry
@sailorvii@sceneryback@sfc-gh-raravena
@wangyang0616@weapons97@xieyanke
@ytcisme@yuyue9284@zedongh
@zhutong196
\ No newline at end of file diff --git a/website-docusaurus/blog/2025-06-13-iflytek_case_study.md b/website-docusaurus/blog/2025-06-13-iflytek_case_study.md new file mode 100644 index 00000000..3610fa4d --- /dev/null +++ b/website-docusaurus/blog/2025-06-13-iflytek_case_study.md @@ -0,0 +1,38 @@ +--- +title: "iFlytek Enhances AI Infrastructure with Volcano, Wins CNCF End-User Case Study Award" +description: "iFlytek was awarded for its innovative use of Volcano in the CNCF End-User Case Study Competition and shared its success in large-scale AI model training at KubeCon + CloudNativeCon China 2025." +date: 2025-06-13 +authors: [volcano] +--- +
{{
}} +
+ +[HONG KONG, CHINA — June 10, 2025] — The Cloud Native Computing Foundation (CNCF) today announced that iFlytek has won the CNCF End-User Case Study Competition. The CNCF, which is committed to building a sustainable ecosystem for cloud native software, recognized iFlytek for its innovative use of Volcano. The company shared its success in large-scale AI model training at the KubeCon + CloudNativeCon China conference, held in Hong Kong from June 10-11. + +### iFlytek's Challenges + +As a leading Chinese technology company specializing in voice and language AI, iFlytek faced significant scaling challenges amid its rapid business growth. Inefficient scheduling led to underutilized GPU resources, while complex workflow management and intense resource contention among teams slowed down research and development, straining the company's infrastructure. + +**By adopting Volcano, iFlytek implemented elastic scheduling, DAG-based workflows, and multi-tenancy isolation, which simplified operations and significantly improved resource utilization.** + +"Before using Volcano, coordinating training across our large-scale GPU clusters was a constant exercise in firefighting, with frequent resource bottlenecks, task failures, and complex pipeline debugging," said **DongJiang, Senior Platform Architect at iFlytek**. "Volcano gives us the flexible control we need to scale our AI training efficiently and reliably. We are honored to be recognized by the CNCF and look forward to sharing our experiences with the community at KubeCon + CloudNativeCon China." + +### About Volcano + +Volcano is a cloud-native batch computing system built on Kubernetes. It is designed for high-performance workloads, including AI/machine learning, big data processing, and scientific computing. Volcano offers advanced scheduling capabilities such as job orchestration, fair-share resource allocation, and queue management to efficiently handle large-scale distributed tasks. After joining the CNCF as a Sandbox project in 2020 and graduating to the Incubating stage in 2022, Volcano has become a critical tool for compute-intensive workloads. + +### Significant Results iFlytek Achieved with Volcano + +As the demand for AI grew, iFlytek turned to Volcano to manage its increasingly large and complex training infrastructure. The engineering team required a more efficient way to allocate resources, handle complex multi-stage training workflows, minimize job interruptions, and ensure fair resource access across teams. **With Volcano, they achieved:** + +* **A 40% improvement in GPU utilization,** leading to significantly lower infrastructure costs and reduced resource idling. +* **A 70% faster recovery from task failures,** ensuring continuous training operations. +* **A 50% reduction in resource interference,** ensuring service stability and resource usage flexibility. + +**Chris Aniszczyk, CTO of the CNCF,** commented, "iFlytek's story is a great example of how open source technology can solve complex and critical challenges at scale. By leveraging Volcano to improve GPU efficiency and streamline their training workflows, they have reduced costs, accelerated development, and built a more reliable AI infrastructure on Kubernetes—a critical advantage for any organization at the forefront of AI." + +As AI workloads become more complex and resource-intensive, iFlytek's success demonstrates that cloud-native tools like Volcano are essential for teams looking to simplify operations and enhance scalability. Their presentation at KubeCon + CloudNativeCon China [1] offers practical insights into managing distributed training more effectively in a Kubernetes environment. + +### References + +[1] Presentation: [https://kccncchn2025.sched.com/event/23EWS?iframe=no](https://kccncchn2025.sched.com/event/23EWS?iframe=no) \ No newline at end of file diff --git a/website-docusaurus/blog/2025-09-29-volcano-1.13.0-release.md b/website-docusaurus/blog/2025-09-29-volcano-1.13.0-release.md new file mode 100644 index 00000000..516d858e --- /dev/null +++ b/website-docusaurus/blog/2025-09-29-volcano-1.13.0-release.md @@ -0,0 +1,342 @@ +--- +title: "Volcano v1.13 Released: Comprehensive Enhancement of Scheduling Capabilities for LLM Training and Inference" +description: "New Features: LeaderWorkerSet support for large model inference, Cron VolcanoJob, Label-based HyperNode auto-discovery, Native Ray framework support, HCCL plugin support, Enhanced NodeGroup functionality, ResourceStrategyFit plugin, Colocation decoupled from OS, Custom oversubscription resource names, Kubernetes v1.33 support, and more" +date: 2025-09-29 +authors: [volcano] +--- +# Volcano v1.13 Released: Comprehensive Enhancement of Scheduling Capabilities for LLM Training and Inference + +On September 29, 2025 (Beijing Time), Volcano v1.13[1] was officially released. This update brings functional enhancements across multiple areas, providing users with a more comprehensive cloud-native batch computing solution. + +## Release Highlights + +The v1.13.0 release includes the following major updates: + +**AI Training and Inference Enhancements** + +- [Support LeaderWorkerSet for Large Model Inference Scenarios](#support-leaderworkerset-for-large-model-inference-scenarios) +- [Introduce Cron VolcanoJob](#introduce-cron-volcanojob) +- [Support Label-based HyperNode Auto-Discovery](#support-label-based-hypernode-auto-discovery) +- [Add Native Ray Framework Support](#add-native-ray-framework-support) +- [Introduce HCCL Plugin Support](#introduce-hccl-plugin-support) + +**Resource Management and Scheduling Enhancements** + +- [Introduce ResourceStrategyFit Plugin](#introduce-resourcestrategyfit-plugin) + - [Independent Scoring Strategy by Resource Type](#independent-scoring-strategy-by-resource-type) + - [Scarce Resource Avoidance (SRA)](#scarce-resource-avoidance-sra) +- [Enhance NodeGroup Functionality](#enhance-nodegroup-functionality) + +**Colocation Enhancements** + +- [Decouple Colocation from OS](#decouple-colocation-from-os) +- [Support Custom OverSubscription Resource Names](#support-custom-oversubscription-resource-names) + +## Support LeaderWorkerSet for Large Model Inference Scenarios + +[LeaderWorkerSet (LWS)](https://github.com/kubernetes-sigs/lws) is an API for deploying a group of Pods on Kubernetes. It is primarily used to address multi-host inference in AI/ML inference workloads, especially scenarios that require sharding large language models (LLMs) and running them across multiple devices on multiple nodes. + +Since its open-source release, Volcano has actively integrated with upstream and downstream ecosystems, building a comprehensive community ecosystem for batch computing such as AI and big data. In the [v0.7](https://github.com/kubernetes-sigs/lws/releases/tag/v0.7.0) release of LWS, it natively integrated Volcano's AI scheduling capabilities. When used with the new version of Volcano, LWS automatically creates PodGroups, which are then scheduled and managed by Volcano, thereby implementing advanced capabilities like Gang scheduling for large model inference scenarios. + +Looking ahead, Volcano will continue to expand its ecosystem integration capabilities, providing robust scheduling and resource management support for more projects dedicated to enabling distributed inference on Kubernetes. + +Usage documentation: [LeaderWorkerSet With Gang](https://github.com/kubernetes-sigs/lws/tree/main/docs/examples/sample/gang-scheduling). + +Related PRs: https://github.com/kubernetes-sigs/lws/pull/496, https://github.com/kubernetes-sigs/lws/pull/498 + +Sincerely thanks to community developer: @[JesseStutler](https://github.com/JesseStutler) + +## Introduce Cron VolcanoJob + +This release introduces support for Cron Volcano Jobs. Users can now periodically create and run Volcano Jobs based on a predefined schedule, similar to native Kubernetes CronJobs, to achieve periodic execution of batch computing tasks like AI and big data. Detailed features are as follows: + +- **Scheduled Execution**: Define the execution cycle of jobs using standard Cron expressions (`spec.schedule`). +- **Timezone Support**: Set the timezone in `spec.timeZone` to ensure jobs execute at the expected local time. +- **Concurrency Policy**: Control concurrent behavior via `spec.concurrencyPolicy`: + - `AllowConcurrent`: Allows concurrent execution of multiple jobs (default). + - `ForbidConcurrent`: Skips the current scheduled execution if the previous job has not completed. + - `ReplaceConcurrent`: Terminates the previous job if it is still running and starts a new one. +- **History Management**: Configure the number of successful (`successfulJobsHistoryLimit`) and failed (`failedJobsHistoryLimit`) job history records to retain; old jobs are automatically cleaned up. +- **Missed Schedule Handling**: The `startingDeadlineSeconds` field allows tolerating scheduling delays within a certain timeframe; timeouts are considered missed executions. +- **Status Tracking**: The CronJob status (`status`) tracks currently active jobs, the last scheduled time, and the last successful completion time for easier monitoring and management. + +Related PRs: https://github.com/volcano-sh/apis/pull/192, https://github.com/volcano-sh/volcano/pull/4560 + +Sincerely thanks to community developers: @[GoingCharlie](https://github.com/volcano-sh/volcano/commits?author=GoingCharlie), @[hwdef](https://github.com/hwdef), @[Monokaix](https://github.com/volcano-sh/volcano/commits?author=Monokaix) + +Usage example: [Cron Volcano Job Example](https://github.com/volcano-sh/volcano/blob/master/example/cronjob/cronjob.yaml). + +## Support Label-based HyperNode Auto-Discovery + +Volcano officially launched network topology-aware scheduling capability in v1.12 and pioneered the UFM auto-discovery mechanism based on InfiniBand (IB) networks. However, for hardware clusters that do not support IB networks or use other network architectures (such as Ethernet), manually maintaining the network topology remains cumbersome. + +To address this issue, the new version introduces a **Label-based HyperNode auto-discovery mechanism**. This feature provides users with a universal and flexible way to describe network topology, transforming complex topology management tasks into simple node label management. + +This mechanism allows users to define the correspondence between topology levels and node labels in the volcano-controller-configmap. The Volcano controller periodically scans all nodes in the cluster and automatically performs the following tasks based on their labels: + +- **Automatic Topology Construction**: Automatically builds multi-layer HyperNode topology structures from top to bottom (e.g., rack -> switch -> node) based on a set of labels on the nodes. +- **Dynamic Maintenance**: When node labels change, or nodes are added or removed, the controller automatically updates the members and structure of the HyperNodes, ensuring the topology information remains consistent with the cluster state. +- **Support for Multiple Topology Types**: Allows users to define multiple independent network topologies simultaneously to adapt to different hardware clusters (e.g., GPU clusters, NPU clusters) or different network partitions. + +Configuration example: + +```yaml +# volcano-controller-configmap.yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: volcano-controller-configmap + namespace: volcano-system +data: + volcano-controller.conf: | + networkTopologyDiscovery: + - source: label + enabled: true + interval: 10m # Discovery interval + config: + networkTopologyTypes: + # Define a topology type named topology-A + topology-A: + # Define topology levels, ordered from top to bottom + - nodeLabel: "volcano.sh/hypercluster" # Top-level HyperNode + - nodeLabel: "volcano.sh/hypernode" # Middle-level HyperNode + - nodeLabel: "kubernetes.io/hostname" # Bottom-level physical node +``` + +This feature is enabled by adding the label source to the Volcano controller's ConfigMap. The above configuration defines a three-layer topology structure named `topology-A`: + +- Top Level (Tier 2): Defined by the `volcano.sh/hypercluster` label. +- Middle Level (Tier 1): Defined by the `volcano.sh/hypernode` label. +- Bottom Level: Physical nodes, identified by the Kubernetes built-in `kubernetes.io/hostname` label. + +When a node is labeled as follows, it will be automatically recognized and classified into the topology path `cluster-s4 -> node-group-s0`: + +```yaml +# Labels for node node-0 +labels: + kubernetes.io/hostname: node-0 + volcano.sh/hypernode: node-group-s0 + volcano.sh/hypercluster: cluster-s4 +``` + +The label-based network topology auto-discovery feature offers excellent generality and flexibility. It is not dependent on specific network hardware (like IB), making it suitable for various heterogeneous clusters, and allows users to flexibly define hierarchical structures of any depth through labels. It automates complex topology maintenance tasks into simple node label management, significantly reducing operational costs and the risk of errors. Furthermore, this mechanism dynamically adapts to changes in cluster nodes and labels, maintaining the accuracy of topology information in real-time without manual intervention. + +Related PR: https://github.com/volcano-sh/volcano/pull/4629 + +Sincerely thanks to community developer: @[zhaoqi612](https://github.com/zhaoqi612) + +Usage documentation: [HyperNode Auto Discovery](https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_hypernode_auto_discovery.md). + +## Add Native Ray Framework Support + +[Ray](https://docs.ray.io/) is an open-source unified distributed computing framework whose core goal is to simplify parallel computing from single machines to large-scale clusters, especially suitable for scaling Python and AI applications. To manage and run Ray on Kubernetes, the community provides KubeRay—an operator specifically designed for Kubernetes. It acts as a bridge between Kubernetes and the Ray framework, greatly simplifying the deployment and management of Ray clusters and jobs. + +Historically, running Ray workloads on Kubernetes primarily relied on the KubeRay Operator. KubeRay integrated Volcano in its [v0.4.0 release (released in 2022)](https://docs.ray.io/en/master/cluster/kubernetes/k8s-ecosystem/volcano.html) for scheduling and resource management of Ray Clusters, addressing issues like resource deadlocks in distributed training scenarios. With this new version of Volcano, users can now directly create and manage Ray clusters and submit computational tasks through native Volcano Jobs. This provides Ray users with an alternative usage scheme, allowing them to more directly utilize Volcano's capabilities such as Gang Scheduling, queue management and fair scheduling, and job lifecycle management for running Ray workloads. + +Related PR: https://github.com/volcano-sh/volcano/pull/4581 + +Sincerely thanks to community developer: @[Wonki4](https://github.com/Wonki4) + +Design documentation: [Ray Framework Plugin Design Doc](https://github.com/volcano-sh/volcano/blob/master/docs/design/distributed-framework-plugins.md). + +Usage documentation: [Ray Plugin User Guide](https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_ray_plugin.md). + +## Introduce HCCL Plugin Support + +The new version adds an HCCL Rank plugin (`hcclrank`) to Volcano Jobs, used for automatically assigning HCCL Ranks to Pods in distributed tasks. This includes: + +- New implementation of the `hcclrank` plugin for Volcano Jobs, supporting automatic calculation and injection of HCCL Rank into Pod annotations based on task type (master/worker) and index. +- The plugin supports custom master/worker task names, allowing users to specify the master/worker roles in distributed tasks. + +This feature enhances Volcano's native support for HCCL communication scenarios, such as Huawei Ascend, facilitating automatic management and assignment of Ranks in AI training tasks. + +Related PR: https://github.com/volcano-sh/volcano/pull/4524 + +Sincerely thanks to community developer: @[kingeasternsun](https://github.com/kingeasternsun) + +## Enhance NodeGroup Functionality + +In hierarchical queue structures, repeatedly configuring the same node group affinity (`nodeGroupAffinity`) for each sub-queue as its parent queue leads to configuration redundancy and difficult maintenance. + +To solve this problem, the Nodegroup plugin adds support for inheriting affinity within hierarchical queues. Once enabled, the scheduler resolves the effective affinity for a queue according to the following rules: + +1. **Prioritize Self-Configuration**: If the queue has defined `spec.affinity`, it uses this configuration directly. +2. **Upward Inheritance**: If the queue has not defined `spec.affinity`, it searches upward through its parents and inherits the affinity configuration defined by the nearest ancestor queue. +3. **Override Capability**: A child queue can override the inherited configuration by defining its own `spec.affinity`, ensuring flexibility. + +This feature allows administrators to set unified node group affinity at a parent queue (e.g., department level), and all child queues (e.g., team level) will automatically inherit this setting, simplifying management. + +For queues without NodeAffinity configuration, the "strict" parameter in the plugin controls scheduling behavior. When `strict` is set to `true` (the default value), tasks in these queues cannot be scheduled to any nodes. When `strict` is set to `false`, these tasks are allowed to be scheduled to regular nodes that do not have the `volcano.sh/nodegroup-name` label. + +In the nodegroup plugin parameters of the scheduler configuration file, setting `enableHierarchy: true` enables hierarchical queue mode, and setting `strict: false` configures non-strict mode. Example configuration is as follows: + +```yaml +actions: "allocate, backfill, preempt, reclaim" +tiers: +- plugins: + - name: nodegroup + arguments: + enableHierarchy: true # Enable hierarchical support + strict: false # Set to non-strict mode, allowing tasks in the queue to be scheduled to nodes without the "volcano.sh/nodegroup-name" label +``` + +Related PRs: https://github.com/volcano-sh/volcano/pull/4455 + +Sincerely thanks to community developers: @[JesseStutler](https://github.com/JesseStutler), @[wuyueandrew](https://github.com/wuyueandrew) + +NodeGroup design documentation: [NodeGroup Design.](https://github.com/volcano-sh/volcano/blob/master/docs/design/node-group.md) + +NodeGroup usage documentation: [NodeGroup User Guide.](https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_nodegroup_plugin.md) + +## Introduce ResourceStrategyFit Plugin + +In the native Kubernetes `noderesources` fit strategy, only a single aggregated (`MostAllocated`) or dispersed (`LeastAllocated`) strategy can be applied to all resources. This has limitations in complex heterogeneous computing environments (like AI/ML clusters). To meet differentiated scheduling requirements, Volcano introduces the enhanced `ResourceStrategyFit` plugin. + +This plugin now integrates two core features: Independent scoring strategies by resource type and Scarce Resource Avoidance (SRA). + +### Independent Scoring Strategy by Resource Type + +This feature allows users to specify `MostAllocated` (binpack) or `LeastAllocated` (spread) strategies for different resources (e.g., cpu, memory, nvidia.com/gpu) independently, and assign different weights to them. The scheduler calculates the node score meticulously based on the independent configuration for each resource. + +To simplify the management of resources within the same family (e.g., different model GPUs from the same vendor), this feature also supports suffix wildcard (`*`) matching for resource names. + +- **Syntax Rules**: Only suffix wildcards are supported, e.g., `nvidia.com/gpu/*`. Patterns like `*` or `vendor.*/gpu` are considered invalid. +- **Matching Priority**: Uses the "longest prefix match" principle. Exact matches have the highest priority; when no exact match exists, the wildcard pattern with the longest prefix is selected. + +Configuration Example: The following configuration sets a high-priority binpack strategy for a specific V100 GPU model, a generic binpack strategy for all other NVIDIA GPUs, and a spread strategy for CPU resources. Pod-level resource scoring strategy configuration is also supported. + +```yaml +actions: "enqueue, allocate, backfill, reclaim, preempt" +tiers: +- plugins: + - name: resource-strategy-fit + arguments: + resourceStrategyFitWeight: 10 + resources: + # Exact match, highest priority + nvidia.com/gpu-v100: + type: MostAllocated + weight: 3 + # Wildcard match, applies to all other NVIDIA GPUs + nvidia.com/gpu/*: + type: MostAllocated + weight: 2 + # Exact match for CPU resource + cpu: + type: LeastAllocated + weight: 1 +``` + +### Scarce Resource Avoidance (SRA) + +SRA is a "soft" strategy designed to improve the overall utilization of expensive or scarce resources (like GPUs). It influences node scoring to guide ordinary tasks that do not require specific scarce resources (e.g., CPU-only tasks) to avoid nodes containing those resources where possible. This helps "reserve" scarce resource nodes for tasks that truly need them, thereby reducing resource contention and task waiting time. + +Mechanism: + +1. Users define a set of "scarce resources" (e.g., `nvidia.com/gpu`) in the configuration. +2. When scheduling a Pod that does *not* request any of the defined scarce resources, the SRA policy takes effect. +3. The scheduler reduces the score of nodes that possess these scarce resources. The more types of scarce resources a node has, the lower its score. +4. For Pods that *do* request scarce resources, the SRA policy does not negatively impact their scheduling decisions. + +Configuration Example: The following configuration defines `nvidia.com/gpu` as a scarce resource. When scheduling a CPU-only task, nodes with GPUs will have their scores reduced, making the task more likely to be scheduled onto nodes without GPUs. + +```yaml +actions: "enqueue, allocate, backfill, reclaim, preempt" +tiers: +- plugins: + - name: resource-strategy-fit + arguments: + # ... binpack/spread strategy configuration for resourceStrategyFit ... + resources: + nvidia.com/gpu: + type: MostAllocated + weight: 2 + cpu: + type: LeastAllocated + weight: 1 + # SRA policy configuration + sra: + enable: true + resources: "nvidia.com/gpu" # Define scarce resource list, comma-separated + weight: 10 # Weight of the SRA policy in the total score + resourceWeight: + nvidia.com/gpu: 1 # Define nvidia.com/gpu as a scarce resource and its weight +``` + +By combining the binpack/spread strategies of ResourceStrategyFit with the avoidance strategy of SRA, users can achieve more refined and efficient scheduling of heterogeneous resources. + +Related PRs: https://github.com/volcano-sh/volcano/pull/4391, https://github.com/volcano-sh/volcano/pull/4454, https://github.com/volcano-sh/volcano/pull/4512 + +Sincerely thanks to community developers: @[LY-today](https://github.com/LY-today), @[XbaoWu](https://github.com/XbaoWu), @[ditingdapeng](https://github.com/ditingdapeng), @[kingeasternsun](https://github.com/kingeasternsun) + +Design documentation: [ResourceStrategyFit Design](https://github.com/volcano-sh/volcano/blob/master/docs/design/resource-strategy-fit-scheduling.md) + +Usage documentation: [ResourceStrategyFit User Guide](https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_resource_strategy_fit_plugin.md) + +## Decouple Colocation from OS + +Volcano's colocation capability consists of two parts: application-level and kernel-level. Application-level colocation provides unified scheduling for online and offline workloads, dynamic resource overcommitment, node pressure eviction, etc. Kernel-level colocation involves QoS guarantees for resources like CPU, Memory, and Network at the kernel level, which typically requires support from a specific OS (like OpenEuler). In the new version, Volcano decouples the colocation capability from the OS. For users using an OS that does not support kernel-level colocation, they can choose to use Volcano's application-level colocation capabilities to achieve unified scheduling of online and offline tasks, dynamic resource overcommitment, and high-priority task guarantees. + +Specific usage: When installing the Volcano agent, specify the `--supported-features` parameter: + +```shell +helm install volcano . --create-namespace -n volcano-system --set custom.colocation_enable=true --set "custom.agent_supported_features=OverSubscription\,Eviction\,Resources" +``` + +Related PRs: https://github.com/volcano-sh/volcano/pull/4409, https://github.com/volcano-sh/volcano/pull/4630 + +Sincerely thanks to community developers: @[ShuhanYan](https://github.com/ShuhanYan), @[Monokaix](https://github.com/Monokaix) + +Colocation documentation: https://volcano.sh/en/docs/colocation/ + +## Support Custom OverSubscription Resource Names + +The Volcano colocation Agent adds parameters `--extend-resource-cpu-name` and `--extend-resource-memory-name`, allowing users to customize the names of overcommitted resources. This supports custom naming for CPU and memory resources (defaults are `kubernetes.io/batch-cpu` and `kubernetes.io/batch-memory` respectively), enhancing flexibility in setting overcommitted resource names. + +Specific usage: When installing Volcano, specify the `--extend-resource-cpu-name` and `--extend-resource-memory-name` parameters: + +```shell +helm install volcano . --create-namespace -n volcano-system --set custom.colocation_enable=true --set custom.agent_extend_resource_cpu_name=example.com/cpu --set custom.agent_extend_resource_memory_name=example.com/memory +``` + +Related PRs: https://github.com/volcano-sh/volcano/pull/4413, https://github.com/volcano-sh/volcano/pull/4630 + +Sincerely thanks to community developers: @[ShuhanYan](https://github.com/ShuhanYan), @[Monokaix](https://github.com/Monokaix) + +Colocation documentation: https://volcano.sh/en/docs/colocation/ + +## Add Kubernetes 1.33 Support + +The Volcano version keeps pace with the Kubernetes community releases. v1.13 supports the latest Kubernetes v1.33 release, ensuring functionality and reliability through comprehensive UT and E2E test cases. + +For participating in Volcano's adaptation work for new Kubernetes versions, refer to: [adapt-k8s-todo](https://github.com/volcano-sh/volcano/blob/v1.13.0/docs/design/adapt-k8s-todo.md). + +Related PR: https://github.com/volcano-sh/volcano/pull/4430 + +Sincerely thanks to community developer: @[mahdikhashan](https://github.com/mahdikhashan) + +## **Conclusion: Volcano v1.13.0 Continues to Lead Cloud-Native Batch Computing** + +Volcano v1.13.0 is not just a technological advancement but a continuation of innovation in cloud-native batch computing. Whether for AI large model training and inference, Big Data scheduling, or resource optimization, Volcano v1.13.0 delivers powerful features and flexible solutions. We believe Volcano v1.13.0 will help users achieve greater heights in cloud-native batch computing, ushering in a new era of AI and Big Data scheduling! + +**Experience Volcano v1.13.0 now and step into a new era of efficient computing!** + +**v1.13.0 release:** https://github.com/volcano-sh/volcano/releases/tag/v1.13.0 + +## **Acknowledgments** + +Volcano v1.13.0 includes contributions from 36 community members. Sincerely thanks to all contributors: + +| @ElectricFish7 | @philandstuff | @junzebao | +| :------------- | :-------------- | :--------------- | +| @ShuhanYan | @GautamBytes | @coldzerofear | +| @houyuting | @lhlxc | @cyf-2002 | +| @neo502721 | @suyiiyii | @dafu-wu | +| @ditingdapeng | @GoingCharlie | @Wonki4 | +| @zhaoqi612 | @huntersman | @JesseStutler | +| @LY-today | @XbaoWu | @kingeasternsun | +| @Monokaix | @wuyueandrew | @mahdikhashan | +| @bibibox | @archlitchi | @guoqinwill | +| @ouyangshengjia| @Poor12 | @dongjiang1989 | +| @zhifei92 | @halcyon-r | @Xu-Wentao | +| @hajnalmt | @kevin-wangzefeng| @linuxfhy | \ No newline at end of file diff --git a/website-docusaurus/blog/authors.yml b/website-docusaurus/blog/authors.yml new file mode 100644 index 00000000..68945363 --- /dev/null +++ b/website-docusaurus/blog/authors.yml @@ -0,0 +1,31 @@ +volcano: + name: Volcano Team + title: Volcano Contributors + url: https://volcano.sh + image_url: https://volcano.sh/img/volcano_logo.svg + +yangshun: + name: Yangshun Tay + title: Ex-Meta Staff Engineer, Co-founder GreatFrontEnd + url: https://linkedin.com/in/yangshun + image_url: https://github.com/yangshun.png + page: true + socials: + x: yangshunz + linkedin: yangshun + github: yangshun + newsletter: https://www.greatfrontend.com + +slorber: + name: Sébastien Lorber + title: Docusaurus maintainer + url: https://sebastienlorber.com + image_url: https://github.com/slorber.png + page: + # customize the url of the author page at /blog/authors/ + permalink: '/all-sebastien-lorber-articles' + socials: + x: sebastienlorber + linkedin: sebastienlorber + github: slorber + newsletter: https://thisweekinreact.com diff --git a/website-docusaurus/blog/tags.yml b/website-docusaurus/blog/tags.yml new file mode 100644 index 00000000..bfaa778f --- /dev/null +++ b/website-docusaurus/blog/tags.yml @@ -0,0 +1,19 @@ +facebook: + label: Facebook + permalink: /facebook + description: Facebook tag description + +hello: + label: Hello + permalink: /hello + description: Hello tag description + +docusaurus: + label: Docusaurus + permalink: /docusaurus + description: Docusaurus tag description + +hola: + label: Hola + permalink: /hola + description: Hola tag description diff --git a/website-docusaurus/docs/_category_.json b/website-docusaurus/docs/_category_.json new file mode 100644 index 00000000..f1609f29 --- /dev/null +++ b/website-docusaurus/docs/_category_.json @@ -0,0 +1,8 @@ +{ + "label": "Documentation", + "position": 1, + "link": { + "type": "generated-index", + "description": "Volcano documentation" + } +} diff --git a/website-docusaurus/docs/actions.md b/website-docusaurus/docs/actions.md new file mode 100644 index 00000000..f287f035 --- /dev/null +++ b/website-docusaurus/docs/actions.md @@ -0,0 +1,67 @@ +--- +title: "Actions" +sidebar_position: 2 +--- + +### Enqueue + +#### Overview + +The Enqueue action filters qualified jobs into the queue to be scheduled. When the minimum number of resource requests under a Job cannot be met, even if the scheduling action is performed for a pod under a Job, pod will not be able to schedule because the "Gang" constraint is not reached. A state refresh from "Pending" to "Inqueue" can only happen if the minimum resource size of the job is met. This state transition is a prerequisite for Pod creation - only after the PodGroup enters the Inqueue state will the vc-controller create Pods for that PodGroup. This mechanism ensures that Pods are only created when resources are available, making it an essential action for scheduler configuration. + +#### Scenario + +Enqueue action is the preparatory stage in the scheduling process. Only when the cluster resources meet the minimum resource request for the job scheduling, the job state can be changed from "pending" to "Enqueue". In this way, Enqueue Action can prevent a large number of unscheduled pods in the cluster and improve the performance of the scheduler in the high-load scenarios where the cluster resources may be insufficient, such as AI/MPI/HPC. + +> Note: There is a conflict between enqueue action and preempt/reclaim action. If both enqueue action and preempt/reclaim action are configured, and enqueue action determines that the job cannot be queued, it may result in failure to generate Pending state Pods, thus failing to trigger preempt/reclaim action. + + +### Allocate + +#### Overview + +This Action binds of , including pre-selection and further selection.PredicateFn is used to filter out nodes that cannot be allocated,and NodeOrderFn is used to score the nodes to find the one that best fits.Allocate action is a essential step in a scheduling process,which is used to handle pod scheduling that has resource requests in the pod list to be scheduled. + +The Allocate action follows the commit mechanism. When a pod's scheduling request is satisfied, a binding action is not necessarily performed for that pod. This step also depends on whether the gang constraint of the Job in which the pod resides is satisfied. Only if the gang constraint of the Job in which the pod resides is satisfied can the pod be scheduled; otherwise, the pod cannot be scheduled. + +#### Scenario + +In a clustered mixed business scenario, the Allocate pre-selected part enables specific businesses (AI, big data, HPC, scientific computing) to quickly filter, sort, and schedule according to their namespace quickly and centrally. In a complex computing scenario such as TensorFlow or MPI, where there are multiple tasks in a single job, the Allocate action traversal multiple task allocation options under the job to find the most appropriate node for each task. + +### Backfill + +#### Overview + +Backfill action is a backfill step in the scheduling process. It deals with BestEffort Pods (pods that do not specify resource requests) scheduling. Similar to Allocate action, Backfill also traverses all nodes to find suitable scheduling positions, with the main difference being that it handles pods without explicit resource requests. + +#### Scenario + +In a cluster, besides workloads that require explicit resource requests, there are also workloads with unclear resource demands. These workloads typically run in BestEffort mode, and Backfill action is responsible for finding suitable scheduling positions for such Pods. + +### Preempt + +#### Overview + +The preempt action is used for resource preemption between jobs in a queue , or between tasks in a job.The preempt action is a preemption step in the scheduling process, which is used to deal with high-priority scheduling problems. It is used for preemption between jobs in the same queue, or between tasks under the same job. + +#### Scenario + +- Preemption between jobs in the same queue: Multiple departments in a company share a cluster, and each department can be mapped into a Queue. Resources of different departments cannot be preempted from each other. This mechanism can well guarantee the isolation of resources of departments..In complex scheduling scenarios, basic resources (CPUs, disks, GPUs, memory, network bandwidth) are allocated based on services: In computing-intensive scenarios, such as AI and high-performance scientific computing, queues require more computing resources, such as CPUs, GPUs, and memory. Big data scenarios, such as the Spark framework, have high requirements on disks. Different queues share resources. If AI jobs preempts all CPU resources, jobs in queues of other scenarios will starve. Therefore, the queue-based resource allocation is used to ensure service running. +- Preemption between tasks in the same job: Usually, there can be multiple tasks in the same Job. For example, in complex AI application scenarios, a parameter server and multiple workers need to be set inside the TF-job, and preemption between multiple workers is supported by preemption within such scenarios. + +### Reclaim + +#### Overview + +Reclaim action is a **cross-queue** resource reclamation step in the scheduling process. Unlike Preempt, Reclaim specifically handles resource reclamation between different Queues. When a job in a Queue needs resources and that Queue is not overused, resources can be reclaimed from other reclaimable queues. + +#### Scenario + +- Cross-queue resource reclamation: In scenarios where multiple departments share a cluster, when a high-priority department's (such as online business department) Queue lacks resources, it can reclaim resources from other department Queues (such as offline computing department). For example, online business Queues can reclaim resources from offline business Queues, but offline business Queues cannot reclaim resources from each other. + +- Resource utilization optimization: Through the cross-queue resource reclamation mechanism, the cluster can improve overall resource utilization while ensuring SLA for high-priority businesses. When a high-priority Queue lacks resources, it can reclaim resources from low-priority Queues to ensure resource requirements for critical businesses. + +> Note: +> +> 1. Reclaim checks multiple conditions during execution: whether the target Queue is reclaimable, whether the task can be reclaimed (Preemptable), whether the job's running requirements can be met after resource reclamation, etc., to ensure the rationality of resource reclamation. +> 2. To make jobs in a Queue reclaimable by other Queues, the reclaimable field in the Queue's spec must be set to true. diff --git a/website-docusaurus/docs/architecture.md b/website-docusaurus/docs/architecture.md new file mode 100644 index 00000000..509dd132 --- /dev/null +++ b/website-docusaurus/docs/architecture.md @@ -0,0 +1,30 @@ +--- +title: "Architecture" +sidebar_position: 2 +--- + +## Overall Architecture + + +{{
}} + + +Volcano is designed for high-performance workloads running on Kubernetes. It follows the design and mechanisms of Kubernetes. + + +{{
}} + + +Volcano consists of **scheduler** / **controllermanager** / **admission** / **vcctl**: + +##### Scheduler +Volcano Scheduler schedules jobs to the most suitable node based on actions and plug-ins. Volcano supplements Kubernetes to support multiple scheduling algorithms for jobs. + +##### ControllerManager (CM) +Volcano CMs manage the lifecycle of Custom Resource Definitions (CRDs). You can use the **Queue CM**, **PodGroup CM**, and **VCJob CM**. + +##### Admission +Volcano Admission is responsible for the CRD API validation. + +##### vcctl +Volcano vcctl is the command line client for Volcano. diff --git a/website-docusaurus/docs/cli.md b/website-docusaurus/docs/cli.md new file mode 100644 index 00000000..a52da687 --- /dev/null +++ b/website-docusaurus/docs/cli.md @@ -0,0 +1,61 @@ +--- +title: "CLI" +sidebar_position: 1 +--- + +## Introduction +A Command Line Interface (CLI) is provided for you to manage resources. +## Configuration + +1. You can obtain the latest executable file by cloning the code from GitHub and running the following command in the root directory of the project: +``` +# make vcctl +``` +2. Copy the executable file to $PATH. You then can execute it anywhere. + +## Command Line List +### Listing all jobs +vcctl job list + +```html +# vcctl job list +Name Creation Phase JobType Replicas Min Pending Running Succeeded Failed Unknown RetryCount +job-1 2020-09-01 Running Batch 1 1 0 1 0 0 0 0 +``` + +### Deleting a specific job +vcctl job delete --name job-name [--namespace job-namespace] + +```html +# vcctl delete job --name job-1 --namespaces default +delete job job-1 successfully +``` + +### Suspending a job +vcctl job suspend --name job-name [--namespace job-namespace] + +```html +# vcctl job suspend --name job-1 --namespace default +``` + +### Resuming a job (opposite to "vcctl job suspend") +vcctl job resume --name job-name [--namespace job-namespace] + +```html +# vcctl job resume --name job-1 --namespace default +``` + +### Running a job +vcctl job run --name job-name [--namespace job-namespace] + +```html +# vcctl job run --name job-1 --namespace default +``` + +## Note +For more information about Volcano command lines, run the following commands: + +```html +# vcctl -h +# vcctl [command] -h +``` diff --git a/website-docusaurus/docs/colocation.md b/website-docusaurus/docs/colocation.md new file mode 100644 index 00000000..9fe14e53 --- /dev/null +++ b/website-docusaurus/docs/colocation.md @@ -0,0 +1,604 @@ +--- +title: "Cloud Native Colocation" +sidebar_position: 3 +--- + +## Background + +With the rapid development of cloud-native technologies, more and more workloads have gradually migrated to Kubernetes, adopting cloud-native approaches for development and maintenance. This has greatly simplified application deployment, orchestration, and operations. Kubernetes has gradually become the "operating system" of the cloud-native era. However, despite the adoption of cloud-native technologies, resource utilization in data centers remains relatively low. To improve resource utilization while ensuring the Service Level Objectives (SLOs) of high-priority workloads, Volcano has introduced a cloud-native colocation solution. This solution provides end-to-end resource isolation and sharing mechanisms from the application layer to the kernel, maximizing resource utilization. + +Cloud-native colocation refers to deploying online and offline workloads in the same cluster using cloud-native methods. Since online workloads exhibit significant peak and off-peak characteristics, offline workloads can utilize idle resources during off-peak periods. When online workloads reach peak usage, mechanisms such as online job priority control are used to suppress the operation of offline jobs, ensuring resource availability for online jobs. This approach improves overall cluster resource utilization while guaranteeing the SLOs of online workloads. + +Typical online and offline workloads have the following characteristics: + +| | **Online Workload** | **Offline Workload** | +| -------------------- | -------------------------------------------------------- | ------------------------------------------------------------ | +| Typical Applications | Microservices, search, recommendation, advertising, etc. | Batch processing, Big data, AI training, video transcoding, etc. | +| Latency | Sensitive | Insensitive | +| SLO | High | Low | +| Load Model | Time-based | Continuous resource usage | +| Fault Tolerance | Low tolerance, high availability requirements | Allows failure and retries | +| Runtime | Stable and continuous operation | Task-based, short runtime | + +## Advantages + +Many companies and users in the industry have explored and practiced colocation technologies to varying degrees, providing positive and beneficial designs and practices for colocation. However, there are some shortcomings, such as the inability to fully decouple from Kubernetes, rough oversubscription resource calculation methods, inconsistent usage patterns for online and offline jobs, and poor user experience. + +Based on these considerations, Volcano has further enhanced and optimized colocation technologies. Compared to industry colocation solutions, Volcano offers the following advantages: + +- Volcano Scheduler natively supports offline workloads scheduling and management. +- No intrusive modifications to Kubernetes. +- Real-time dynamic calculation of oversubscribed resources, better balancing resource utilization and workload QoS requirements. +- OS-level isolation and QoS guarantees. + +## Architecture + +The cloud-native colocation architecture mainly includes the Volcano Scheduler, Volcano SLO Agent, and Enhanced OS. + +- **Volcano Scheduler**: Responsible for unified scheduling of online and offline workloads, providing abstractions such as queues, groups, job priorities, fair scheduling, and resource reservation to meet the scheduling needs of microservices, big data, AI, and other workloads. +- **Volcano SLO Agent**: Each node in the cluster deploys a Volcano SLO Agent, which dynamically calculates the allocated but unused resources on each node and oversubscribes these resources for offline workloads. It also ensures node QoS by evicting offline workloads when CPU/memory pressure is detected, ensuring the priority of online workloads. +- **Enhanced OS**: The Volcano SLO Agent provides node-level QoS guarantees at the application layer. For more refined and mandatory isolation, the kernel also needs to distinguish QoS types and isolate resources at the CPU, memory, network, and L3 cache levels. The kernel exposes a series of cgroup interfaces, allowing the Volcano SLO Agent to set different cgroups for online and offline workloads, achieving fine-grained kernel-level isolation and enabling online jobs to suppress offline workloads. + +
{{
}} +Architecture +
+ +## Features + +### QoS-Based colocation Model + +After colocating online and offline workloads in the same cluster, offline workloads, which are typically CPU or IO-intensive, can interfere with online workloads, leading to degraded QoS for online workloads. To minimize the interference of offline workloads on online workloads, it is necessary to implement QoS classification and control for online and offline workloads. By labeling online and offline workloads, a QoS model is defined, ensuring that online workload QoS is prioritized during runtime and reducing interference from offline workloads. + +Based on the classification and operational characteristics of online and offline workloads, Volcano abstracts a model and defines different QoS levels. Different types of workloads can set different QoS levels, which are mapped to CPU and memory levels in the kernel. Higher levels will gain greater resource usage rights and preemption priority. During scheduling, different QoS levels corresponding to job types are distinguished, and rich scheduling policies are executed. Meanwhile, the Volcano SLO Agent calls kernel interfaces to set different QoS priorities for online and offline workloads. The QoS model is defined as follows: + +| QoS Level | Typical Application Scenarios | CPU Priority | Memory Priority | +| :----------------------------: | :----------------------------------------------------------: | :----------: | :-------------: | +| LC (Latency Critical) | Core online workloads with extremely high latency sensitivity, exclusive CPU usage | Exclusive | 0 | +| HLS (Highly Latency Sensitive) | Online workloads with extremely high latency sensitivity | 2 | 0 | +| LS (Latency Sensitive) | Nearline workloads with latency sensitivity | 1 | 0 | +| BE (Best Effort) | Offline AI and big data workloads, tolerant to eviction | -1 | -1 | + +Users can set the annotation of the workload's corresponding Pod to indicate different workloads types. For example, setting `volcano.sh/qos-level="LS"` indicates that the Pod is a latency-sensitive nearline workload, while setting `volcano.sh/qos-level="BE"` indicates that the Pod is an offline workload. + +### Unified Scheduling of Online and Offline workloads + +When deploying online and offline workloads in the same cluster, using multiple schedulers to schedule different types of workloads can lead to concurrent resource update conflicts, as each scheduler has a global view of resources. To avoid this issue, a unified scheduler is needed to schedule both online and offline workloads. + +As the industry's first cloud-native batch computing project, Volcano natively supports the scheduling and management of AI and big data workloads. It also supports multi-tenant queue management and fair scheduling, unifying support for almost all mainstream computing frameworks, including Ray, Kubeflow, Spark, Flink, PyTorch, TensorFlow, MPI, Horovod, MindSpore, PaddlePaddle, MXNet, Argo, and more. It integrates Kubernetes' default scheduling algorithms, supporting unified scheduling of batch jobs and microservices, and prioritizes scheduling based on the job's QoS model. Therefore, it supports unified scheduling of online and offline workloads. + +### Dynamic Resource Oversubscription + +Kubernetes' existing resource scheduling model is based on Pod requests. However, users often blindly set high request values while actual usage is low, leading to resource waste. Additionally, online jobs exhibit significant peak and off-peak characteristics, making it ideal to oversubscribe underutilized resources during off-peak periods for offline workloads, thereby improving cluster resource utilization. + +The Volcano SLO Agent dynamically calculates the allocated but unused resources of Pods and oversubscribes these resources for offline workloads, increasing Pod deployment density and improving resource utilization. + +
{{
}}Dynamic Resource OverSubscription Diagram
+ +The increase in oversubscribed resources changes the node's original available resources, and oversubscribed resources are exclusively used by offline workloads. Therefore, different schemes are available for calculating, reporting, and using oversubscribed resources. To better decouple from Kubernetes and support user-defined oversubscribed resource representations, Volcano provides native and extend modes for oversubscribed resource calculation and reporting. The native mode reports oversubscribed resources to the node's `allocatable` field, ensuring consistent usage patterns for online and offline workloads and improving user experience. The extend mode supports reporting oversubscribed resources as extended resources, decoupling from Kubernetes. Users can flexibly choose the reporting and usage methods of oversubscribed resources based on actual needs. + +### QoS Guarantees + +After colocating online and offline workloads, resource competition between offline and online workloads can cause interference with online workloads. Therefore, while improving resource utilization, it is essential to ensure the QoS of online workloads and avoid interference from offline workloads. + +Offline workloads typically use various types of resources, so resource isolation measures need to be implemented for each dimension. Volcano sets resource isolation for CPU, memory, and network through kernel interfaces. When resource competition occurs between online and offline workloads, the resource usage of offline workloads is suppressed to prioritize the QoS of online workloads. + +- **CPU:** The OS provides five levels of CPU QoS, ranging from -2 to 2. Higher QoS levels indicate more CPU time slices and higher preemption priority. By setting the `cpu.qos_level` in the CPU subsystem's cgroup, different CPU QoS levels can be assigned to different workloads. + +- **Memory:** Memory isolation ensures that offline jobs are preferentially OOM killed when the system experiences OOM. By setting the `memory.qos_level` in the memory subsystem's cgroup, different Memory QoS levels can be assigned to different workloads. + +- **Network:** Network isolation ensures egress bandwidth guarantees for online jobs. It is based on the node's total bandwidth and uses cgroup, tc, and eBPF technologies to suppress the egress bandwidth of offline jobs for online workloads. + +
{{
}} +Network Isolation Solution +
+ +The figure above shows the technical solution for network isolation. By injecting rate-limiting programs into the kernel using eBPF, packet forwarding is controlled to achieve rate limiting. The cgroup eBPF can label packets of online and offline workloads to distinguish their traffic. The tc eBPF sets three watermarks: online workload watermark, offline workload high watermark, and offline workload low watermark. When online workload traffic exceeds the watermark, the bandwidth of offline workloads is limited, with the upper limit set to the offline workload low watermark, yielding to online traffic. When online workload traffic is below the watermark, the bandwidth limit for offline workloads is lifted, with the upper limit set to the offline workload high watermark, improving resource utilization. Additionally, the packet sending time (EDT) can be calculated based on the bandwidth of offline traffic to implement rate limiting for offline workloads. + +
{{
}} +Bandwidth Limitation Diagram for Online and Offline Workloads +
+ +### CPU Burst + +If a container in a Pod has a CPU limit set, the container's CPU usage will be capped at the limit value, resulting in CPU throttling. Frequent CPU throttling can affect workload performance, increasing the tail latency of workload responses, especially for latency-sensitive workloads. + +The CPU Burst capability of the Volcano agent provides an elastic throttling mechanism that allows brief bursts beyond the CPU limit to reduce workload tail latency. The principle is that when a workload has unused CPU quota in a CPU scheduling period, the system accumulates these unused quotas. In subsequent scheduling periods, if the workload needs to exceed the CPU limit, it can use the accumulated CPU quota to achieve a burst beyond the limit. + +When CPU Burst is not enabled, the container's CPU usage is strictly limited to the CPU limit, and bursting is not possible. As shown below: + +
{{
}} +
+ +When CPU Burst is enabled, the container's CPU usage can exceed the limit, enabling bursting. As shown below: + +
{{
}} +
+ +With the CPU Burst capability provided by the Volcano agent, high-priority workloads can avoid throttling at critical moments, ensuring the stability of latency-sensitive workloads. + +## Usage Guide + +### Installing the Volcano Agent + +#### Install via Helm + +```shell +helm repo add volcano-sh https://volcano-sh.github.io/helm-charts + +helm repo update + +helm install volcano volcano-sh/volcano -n volcano-system --create-namespace --set custom.colocation_enable=true +``` +#### Install via Yaml + +Please follow this [document](https://github.com/volcano-sh/volcano?tab=readme-ov-file#quick-start-guide) to install Volcano, and then use the following command to install the Volcano agent. + +```shell +kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-agent-development.yaml +``` + +Check that all Volcano components are running successfully. + +```shell +kubectl get po -n volcano-system +NAME READY STATUS RESTARTS AGE +volcano-admission-76bd985b56-fnpjg 1/1 Running 0 3d +volcano-admission-init-wmxc7 0/1 Completed 0 3d +volcano-agent-q85jn 1/1 Running 0 3d +volcano-controllers-7655bb499f-gpg9l 1/1 Running 0 3d +volcano-scheduler-6bf4759c45-c666z 1/1 Running 0 3d +``` + +Enable colocation and oversubscription by labeling nodes. + +```shell +kubectl label node $node volcano.sh/oversubscription=true # replace $node with real node name in your Kubernetes cluster. + +kubectl label node $node volcano.sh/colocation=true # replace $node with real node name in your Kubernetes cluster. +``` + +### CPU Burst Example + +This example demonstrates how to use CPU Burst and its benefits. + +#### Enabling CPU Burst + +Deploy a Deployment and expose a ClusterIP service. The annotation `volcano.sh/enable-quota-burst: "true"` enables CPU Burst for the Pod. + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: nginx + namespace: default +spec: + replicas: 1 + selector: + matchLabels: + app: nginx + template: + metadata: + labels: + app: nginx + annotations: + volcano.sh/enable-quota-burst: "true" # pod enabled CPU Burst + spec: + containers: + - name: container-1 + image: nginx:latest + resources: + limits: + cpu: "2" + requests: + cpu: "1" +--- +apiVersion: v1 +kind: Service +metadata: + name: nginx + namespace: default + labels: + app: nginx +spec: + selector: + app: nginx + ports: + - name: http + targetPort: 80 + port: 80 + protocol: TCP + type: ClusterIP +``` + +#### Stress Testing + +Use the `stress` tool to apply pressure to the nginx Pod. + +```bash +wrk -H "Accept-Encoding: deflate, gzip" -t 2 -c 8 -d 120 --latency --timeout 2s http://$(kubectl get svc nginx -o jsonpath='{.spec.clusterIP}') +``` + +#### Checking CPU Throttling + +Check the CPU throttling status of the Pod's container. We can see that `nr_bursts` and `burst_time` are not zero, while `nr_throttled` and `throttled_time` are relatively small, indicating that the Pod has used burst CPU quotas. + +```bash +cat /sys/fs/cgroup/cpu/kubepods/burstable/podd2988e14-83bc-4d3d-931a-59f8a3174396/cpu.stat # replace nginx pod uid in your Kubernetes cluster. +nr_periods 1210 +nr_throttled 9 +throttled_time 193613865 +nr_bursts 448 +burst_time 6543701690 +``` + +If we set the Pod's annotation `volcano.sh/enable-quota-burst=false` (disabling CPU Burst) and perform another stress test, `nr_throttled` and `throttled_time` will be relatively large, indicating strict CPU throttling, while `nr_bursts` and `burst_time` will be zero, indicating no CPU Burst occurred. + +```bash +cat /sys/fs/cgroup/cpu/kubepods/burstable/podeeb542c6-b667-4da4-9ac9-86ced4e93fbb/cpu.stat #replace nginx pod uid in your Kubernetes cluster. +nr_periods 1210 +nr_throttled 488 +throttled_time 10125826283 +nr_bursts 0 +burst_time 0 +``` + +#### Notes + +CPU Burst relies on Linux kernel functionality. This feature is only effective on hosts with Linux kernel versions >= 5.14 and certain Linux distributions (e.g., OpenEuler 22.03 SP2 or later). + +### Dynamic Resource Oversubscription Example + +This example demonstrates the dynamic resource capabilities on a node and shows the suppression and eviction mechanisms when the node faces resource pressure. The node is configured with 8 CPU cores and 16GB of memory. + +#### Checking Node Oversubscribed Resources + +Oversubscribed resources on a node are calculated by subtracting the actual resource usage from the node's allocatable resources. Oversubscribed resources include CPU and memory, represented by `kubernetes.io/batch-cpu` and `kubernetes.io/batch-memory`, respectively, and reported as extended resources in the node's `Allocatable` field. Online tasks use native resources (`cpu` and `memory`), while offline tasks use oversubscribed resources (`kubernetes.io/batch-cpu` and `kubernetes.io/batch-memory`), increasing Pod deployment density and resource utilization. + +```bash +kubectl describe node $node # replace $node with a real node in your cluster +Allocatable: + cpu: 8 + ephemeral-storage: 33042054704 + hugepages-1Gi: 0 + hugepages-2Mi: 0 + kubernetes.io/batch-cpu: 7937 # CPU oversubscribed resources, in millicores (1 core = 1000 millicores) + kubernetes.io/batch-memory: 14327175770 # Memory oversubscribed resources, in bytes + memory: 15754924Ki + pods: 110 +``` + +#### Deploying Online and Offline Jobs + +Online jobs are identified by setting the annotation `volcano.sh/qos-level: "LC"`, `volcano.sh/qos-level: "HLS"`, or `volcano.sh/qos-level: "LS"`. Offline jobs are identified by setting the annotation `volcano.sh/qos-level: "BE"` and can only use oversubscribed resources (`kubernetes.io/batch-cpu` and `kubernetes.io/batch-memory`). We use an image containing the `stress` tool to simulate the pressure increase of online jobs. If you cannot access this image, you can replace it with another image containing the `stress` tool. + +```yaml +# Online Job +apiVersion: apps/v1 +kind: Deployment +metadata: + name: online-demo + namespace: default +spec: + replicas: 1 + selector: + matchLabels: + app: online-demo + template: + metadata: + labels: + app: online-demo + annotations: + volcano.sh/qos-level: "HLS" # Identify online jobs + spec: + containers: + - name: container-1 + image: polinux/stress + imagePullPolicy: IfNotPresent + command: ["stress", "--cpu", "7"] # Run stress test + resources: + requests: + cpu: 2 +--- +# Offline Job +apiVersion: apps/v1 +kind: Deployment +metadata: + name: offline-demo + namespace: default +spec: + replicas: 1 + selector: + matchLabels: + app: offline-demo + template: + metadata: + labels: + app: offline-demo + annotations: + volcano.sh/qos-level: "BE" # Identify offline jobs + spec: + containers: + - name: container-1 + image: nginx:latest + resources: + requests: + kubernetes.io/batch-cpu: 4000 # 4 CPU cores + kubernetes.io/batch-memory: 10737418240 # 10Gi memory +``` + +#### Ensuring Online and Offline Jobs Run Successfully + +```bash +kubectl get po +NAME READY STATUS RESTARTS AGE +offline-demo-f59758bb-vlbp7 1/1 Running 0 6s +online-demo-9f9bbdb58-fljzs 1/1 Running 0 6s +``` + +#### Eviction Mechanism Under Node Pressure + +When a node experiences pressure and resource utilization reaches the set threshold, the eviction mechanism is triggered. The QoS of online jobs is guaranteed by both the **Volcano Agent** and the **host OS**. The Volcano Agent monitors node resource utilization in real-time. When node resource utilization exceeds the threshold, offline jobs are evicted. For CPU resources, the default threshold is **80%**. We simulate resource pressure by applying **7 CPU cores of pressure** to the online job. After about 1 minute, we can observe the offline job being evicted through event logs. + +```bash +kubectl get event | grep Evicted +69s Warning Evicted pod/offline-demo-785cff7f58-gwqwc Evict offline pod due to CPU resource pressure +``` + +When node resource pressure increases, we can observe that oversubscribed resources (`kubernetes.io/batch-cpu` and `kubernetes.io/batch-memory`) decrease. This is because oversubscribed resources are calculated by subtracting actual resource usage from the node's allocatable resources. When resource usage by online or offline jobs increases, the node's available resources decrease, leading to a reduction in oversubscribed resources. + +```bash +kubectl describe node $node # replace $node with a real node in your cluster +Allocatable: + cpu: 8 + ephemeral-storage: 33042054704 + hugepages-1Gi: 0 + hugepages-2Mi: 0 + kubernetes.io/batch-cpu: 978 # CPU oversubscribed resources decrease + kubernetes.io/batch-memory: 14310391443 + memory: 15754924Ki + pods: 110 +``` + +When eviction occurs, the **Volcano Agent** adds an eviction taint to the current node to prevent new workloads from being scheduled to the node, avoiding additional burden on the already pressured node. We can observe that newly created offline job Pods will remain in the `Pending` state due to this eviction taint. + +```bash +kubectl get po +NAME READY STATUS RESTARTS AGE +offline-demo-f59758bb-kwb54 0/1 Pending 0 58s +online-demo-9f9bbdb58-54fnx 1/1 Running 0 2m1s + +kubectl describe po offline-demo-f59758bb-kwb54 +Events: + Type Reason Age From Message + ---- ------ ---- ---- ------- + Warning FailedScheduling 69s default-scheduler 0/1 nodes are available: 1 node(s) had taint {volcano.sh/offline-job-evicting: }, that the pod didn't tolerate. +``` + +If the online job is stopped to release node resource pressure, the **Volcano Agent** detects the decrease in node resource utilization and automatically removes the eviction taint. Once the taint is removed, new Pods can be scheduled to the node normally. + +#### Notes + +The **Volcano Agent** defines a QoS resource model for online and offline jobs and provides application-level guarantees for online jobs (e.g., evicting offline jobs under node resource pressure). Meanwhile, CPU and memory isolation and suppression are guaranteed at the OS level by the **host kernel**. Note that the Volcano Agent currently only supports **openEuler 22.03 SP2** and later versions, so ensure the correct OS type and version when using this feature. + +### Egress Network Bandwidth Guarantee Example + +In the egress network bandwidth isolation mechanism, the bandwidth usage of offline jobs is limited, especially when online jobs require more bandwidth. To achieve finer bandwidth control, three watermark parameters are typically defined to dynamically adjust the bandwidth allocation for offline jobs. + + + + + + + + + + + + + + + + + + + + + + + + + + +
WatermarkDescriptionDefault Value
onlineBandwidthWatermarkPercentThe ratio of the online bandwidth watermark value to the node's base bandwidth:
onlineBandwidthWatermark value = node base bandwidth * onlineBandwidthWatermarkPercent / 100
   80   
offlineHighBandwidthPercentThe ratio of the offline high bandwidth watermark value to the node's base bandwidth:
offlineHighBandwidth value = node base bandwidth * offlineHighBandwidthPercent / 100
It represents the upper limit of bandwidth that can be used by offline workloads when the online workloads use bandwidth ratio less than onlineBandwidthWatermarkPercent.
For example: node base bandwidth = 100Mbps, onlineBandwidthWatermarkPercent = 80, offlineHighBandwidthPercent = 40, when online workloads use bandwidth less than 100Mbps * 0.8 = 80Mbps, then the offline workloads can use at most 100Mbps * 0.4 = 40Mbps bandwidth.
   40   
offlineLowBandwidthPercentThe ratio of the offline low bandwidth watermark value to the node's base bandwidth:
offlineLowBandwidth value = node base bandwidth * offlineLowBandwidthPercent / 100
It represents the upper limit of bandwidth that can be used by offline workloads when the online workloads use bandwidth ratio more than onlineBandwidthWatermarkPercent.
For example: node bandwidth = 100Mbps, onlineBandwidthWatermarkPercent = 80, offlineLowBandwidthPercent = 10, when online workloads use bandwidth more than 100Mbps * 0.8 = 80Mbps, then the offline workloads can use at most 100Mbps * 0.1 = 10Mbps bandwidth.
   10   
+ +#### Setting Node Network Bandwidth + +This example demonstrates how online jobs suppress the entire network bandwidth of offline jobs. We will use the `iperf` tool to simulate the ingress bandwidth traffic of online and offline jobs. + +Add the annotation `volcano.sh/network-bandwidth-rate` to all nodes to specify the network bandwidth rate. The example sets the value to `1000Mbps`. Please set an appropriate value based on your actual environment and replace `$node` with the actual node name. + +```bash +kubectl annotate node $node_name volcano.sh/network-bandwidth-rate=1000 +``` + +#### Deploying Online and Offline Jobs + +Deploy an online and offline Deployment. Replace `$node_ip` with the node IP accessible to the Pod in your environment. Also, start the `iperf` server on the `$node_ip` node using the command `iperf -s` to ensure the Pod can access the `iperf` server. + +```yaml +# Online Job +apiVersion: apps/v1 +kind: Deployment +metadata: + name: online-iperf + namespace: default +spec: + replicas: 1 + selector: + matchLabels: + app: online-iperf + template: + metadata: + labels: + app: online-iperf + annotations: + volcano.sh/qos-level: "HLS" # Identify online jobs + spec: + containers: + - name: container-1 + image: volcanosh/iperf + command: + - /bin/sh + - -c + - | + iperf -c $node_ip -i 1 -t 30 -f mb # Simulate bandwidth consumption + echo finished... + sleep 1000000 +--- +# Offline Job +apiVersion: apps/v1 +kind: Deployment +metadata: + name: offline-iperf + namespace: default +spec: + replicas: 1 + selector: + matchLabels: + app: offline-iperf + template: + metadata: + labels: + app: offline-iperf + annotations: + volcano.sh/qos-level: "BE" # Identify offline jobs + spec: + containers: + - name: container-1 + image: volcanosh/iperf + command: + - /bin/sh + - -c + - | + iperf -c $node_ip -i 1 -t 30 -f mb # Simulate bandwidth consumption + echo finished... + sleep 1000000 +``` + +#### Viewing Logs + +View the logs of online and offline jobs: + +Online job logs: + +```bash +Connecting to host 192.168.2.30, port 5201 +[ 5] local 192.168.2.115 port 58492 connected to 192.168.2.30 port 5201 +[ ID] Interval Transfer Bandwidth +[ 5] 0.00-1.00 sec 118 MBytes 990 Mbits/sec +[ 5] 1.00-2.00 sec 106 MBytes 889 Mbits/sec +[ 5] 2.00-3.00 sec 107 MBytes 897 Mbits/sec +[ 5] 3.00-4.00 sec 107 MBytes 903 Mbits/sec +[ 5] 4.00-5.00 sec 107 MBytes 899 Mbits/sec +[ 5] 5.00-6.00 sec 107 MBytes 902 Mbits/sec +[ 5] 6.00-7.00 sec 705 MBytes 884 Mbits/sec +... +``` + +Offline job logs: + +```bash +Connecting to host 192.168.2.30, port 5201 +[ 5] local 192.168.2.115 port 44362 connected to 192.168.2.30 port 5201 +[ ID] Interval Transfer Bandwidth +[ 5] 0.00-1.00 sec 8 MBytes 70 Mbits/sec +[ 5] 1.00-2.00 sec 12 MBytes 102 Mbits/sec +[ 5] 2.00-3.00 sec 11 MBytes 98 Mbits/sec +[ 5] 3.00-4.00 sec 11 MBytes 99 Mbits/sec +[ 5] 4.00-5.00 sec 11 MBytes 99 Mbits/sec +[ 5] 5.00-6.00 sec 11 MBytes 97 Mbits/sec +[ 5] 6.00-7.00 sec 11 MBytes 98 Mbits/sec +... +``` + +We can see that when the bandwidth usage of online jobs exceeds the node's `onlineBandwidthWatermarkPercent` (default 80), offline jobs can only use about 10% of the bandwidth, indicating that when online jobs use network bandwidth beyond the watermark, the network bandwidth usage of offline jobs is suppressed to a lower value. + +### Advanced Settings + +#### Feature Toggles + +The colocation feature has a unified toggle on nodes. If a node has the label `volcano.sh/oversubscription=true` or `volcano.sh/colocation=true`, the colocation feature is enabled. You can remove these labels to disable all colocation features. When a node has these labels, all colocation features take effect. + +- If you only want to use the colocation feature for online and offline jobs without enabling resource oversubscription, simply set the node label `volcano.sh/colocation="true"`. +- If you want to use both the colocation feature and resource oversubscription, set the node label `volcano.sh/oversubscription=true`. + +By default, the `volcano-agent-configuration` ConfigMap in the `volcano-system` namespace holds all configurations for the Volcano Agent. + +Each colocation feature (CPU Burst / Dynamic Resource Oversubscription / Egress Network Bandwidth Guarantee) has an independent toggle. You can enable or disable each feature by modifying the `volcano-agent-configuration` ConfigMap in the `volcano-system` namespace. + +- Set the `enable` field to `true` to enable the CPU Burst feature, or `false` to disable it. + + ```json + "cpuBurstConfig":{ + "enable": true + } + ``` + +- Set the `enable` field to `true` to enable the dynamic resource oversubscription feature, or `false` to disable it. + + ```json + "overSubscriptionConfig":{ + "enable": true, + } + ``` + +- Set the `enable` field to `true` to enable the egress network bandwidth guarantee feature, or `false` to disable it. + + ```json + "networkQosConfig":{ + "enable": true, + } + ``` + +#### CPU Burst + +For containers in Pods with the CPU Burst feature enabled, their CPU usage can burst up to the container's CPU limit (`cpu limit`). If multiple Pods use burst CPU resources simultaneously, CPU contention may occur, affecting the CFS (Completely Fair Scheduler) scheduling of the CPU. + +You can specify a custom burst quota by setting the Pod's annotation `volcano.sh/quota-burst-time`. For example: + +If a container's CPU limit is 4 cores, the Volcano Agent defaults the container's CGroup `cpu.cfs_quota_us` value to `400000` (the CFS base period is `100000`, so 4 cores correspond to `4 * 100000 = 400000`). This means the container can burst up to 4 additional CPU cores at a time. If you set `volcano.sh/quota-burst-time=200000`, the container can only burst up to 2 additional CPU cores at a time. + +#### Dynamic Resource Oversubscription + +By default, the calculation of oversubscribed resources and the eviction of offline workloads only consider the resource usage of Pods on the node. If you want to consider the resource utilization of the node itself, set the Volcano agent's `--include-system-usage=true` flag. + +To avoid excessive pressure on nodes, the Volcano agent sets an oversubscription ratio to determine the ratio of idle resource oversubscription. You can change this parameter by setting the `--oversubscription-ratio` flag. The default value is 60, meaning 60% of idle resources will be oversubscribed. If you set `--oversubscription-ratio=100`, all idle resources will be oversubscribed. + +When a node is under pressure, the Volcano agent evicts offline workloads. The eviction threshold can be configured via the `volcano-agent-configuration` ConfigMap. `"evictingCPUHighWatermark":80` means eviction will occur when the node's CPU utilization exceeds 80% for a period, and the node cannot schedule new Pods during eviction. `"evictingCPULowWatermark":30` means the node will resume scheduling when the CPU utilization drops below 30%. `evictingMemoryHighWatermark` and `evictingMemoryLowWatermark` have the same meaning but apply to memory resources. + +```json +"evictingConfig":{ + "evictingCPUHighWatermark": 80, + "evictingMemoryHighWatermark": 60, + "evictingCPULowWatermark": 30, + "evictingMemoryLowWatermark": 30 +} +``` + +#### Egress Network Bandwidth Guarantee + +You can adjust the online and offline bandwidth watermarks by modifying the `volcano-agent-configuration` ConfigMap. `qosCheckInterval` represents the interval for monitoring bandwidth watermarks by the Volcano agent. Be cautious when modifying this value. + +```json +"networkQosConfig":{ + "enable": true, + "onlineBandwidthWatermarkPercent": 80, + "offlineHighBandwidthPercent":40, + "offlineLowBandwidthPercent": 10, + "qosCheckInterval": 10000000 + } +``` + +#### Custom Oversubscription Policy Development + +The Volcano agent defaults to using the [extend](https://github.com/volcano-sh/volcano/tree/master/pkg/agent/oversubscription/policy) method to report and use oversubscribed resources, i.e., reporting oversubscribed resources as an extended resource type to the node. If you want to customize the reporting and usage of oversubscribed resources, such as reporting them as native CPU and memory resources, and customize suspension and resumption of scheduling behaviors, implement the [policy Interface](https://github.com/volcano-sh/volcano/blob/4dea29b334877058786615ac1ed79143601dc600/pkg/agent/oversubscription/policy/policy.go#L48) to develop a custom oversubscription policy, and set the Volcano agent's startup parameter `oversubscription-policy` to the corresponding policy. diff --git a/website-docusaurus/docs/contribution.md b/website-docusaurus/docs/contribution.md new file mode 100644 index 00000000..6a8f676b --- /dev/null +++ b/website-docusaurus/docs/contribution.md @@ -0,0 +1,151 @@ +--- +title: "Contribution" +sidebar_position: 1 +--- + +# Welcome + +Welcome to Volcano! + +- [Before You Start](#before-you-start) + - [Code of Conduct](#code-of-conduct) + - [Community discussions](#community-discussions) + - [Community Expectations](#community-expectations) +- [Getting Started](#getting-started) +- [Your First Contribution](#your-first-contribution) + - [Find Something to Work On](#find-something-to-work-on) + - [Find a Good Topic](#find-a-good-topic) + - [Work on an Issue](#work-on-an-issue) + - [File an Issue](#file-an-issue) +- [Contribution Workflow](#contribution-workflow) + - [Open a Pull Request](#open-a-pull-request) +- [Code Review](#code-review) +- [Commit Message Format](#commit-message-format) + - [Testing](#testing) + +## Before You Start + +### Code of Conduct + +All Volcano contributors must read and observe the [Code of Conduct](https://github.com/volcano-sh/website/blob/master/CODE_OF_CONDUCT.md). + +### Community discussions + +To better communicate with developers in the Volcano community, please subscribe to the Volcano channel by following these steps: + +- Join the CNCF `volcano` channel on [Slack](https://cloud-native.slack.com/archives/C011GJDQS0N) to participate in community discussions. + +### Community Expectations + +Volcano is an open-source project driven by the Volcano community, which strives to promote a healthy, friendly, and productive environment. +The community is committed to developing a system that helps running high-performance workloads, such as AI, ML, and deep learning applications, on Kubernetes. Building such a system would be impossible without the support of community contributors with similar aspirations. + +- For details about the community roles, see [Community Membership](https://github.com/volcano-sh/website/blob/master/content/en/docs/membership.md). If you make significant contributions, you will have a more advanced role in the community. + + +## Getting Started + +- For more information on building and deployment, see [setup](https://github.com/volcano-sh/website/blob/master/content/en/docs/installation.md). + + +## Your First Contribution + +You can contribute in different areas, including filing issues, developing features, fixing critical bugs, and getting your work reviewed and merged. + +If you have any question about the development process, visit the [Slack Channel](https://cloud-native.slack.com/archives/C011GJDQS0N) ([sign up](https://join.slack.com/t/volcano-sh/shared_invite/enQtNTU5NTU3NDU0MTc4LTgzZTQ2MzViNTFmNDg1ZGUyMzcwNjgxZGQ1ZDdhOGE3Mzg1Y2NkZjk1MDJlZTZhZWU5MDg2MWJhMzI3Mjg3ZTk)) +or join our [mailing list](https://groups.google.com/forum/#!forum/volcano-sh). + +#### Find Something to Work On + +You are welcome to open an issue concerning documentation, report bugs, and push changes to the repositories. +Feel free to optimize code that does not follow the best coding practices, perform code refactoring, or compile test cases. +The following steps will help you get started. + +#### Find a Good Topic + +There are [multiple repositories](https://github.com/volcano-sh/) within the Volcano organization with each repository containing a beginner-friendly issue that does not require deep understanding of the Volcano project. +For example, in [Volcano-Issues](https://github.com/volcano-sh/volcano), you can choose issues labeled with [help wanted](https://github.com/volcano-sh/volcano/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22) or [good first issue](https://github.com/volcano-sh/volcano/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22). +New contributors are welcome to work on these issues. + +Another good way to start is to find a document that needs improvement, for example, a document that is missing a link or contains a broken link. For details on the workflow, see [Contribution Workflow](#contributor-workflow). + +#### Work on an Issue + +When you are ready to work on an issue, reply with `/assign` or `/assign @yourself` on an issue. +The bot then will assign the issue to you. Your name will then be displayed on the `Assignees` list. + +#### File an Issue + +You are welcome to file issues to Volcano sub-repositories. + +*Example:* You can file an issue for [Volcano](https://github.com/volcano-sh/volcano/issues). + +Follow the submission guidelines when you open an issue. + +## Contribution Workflow + +All contributors are welcome to open issues and create pull requests. + +The contribution workflow is as follows: + +- Create a topic branch from the existing branch (usually the master branch). +- Edit and commit the code. +- Make sure [commit message format](#commit-message-format) is followed. +- Push changes in the topic branch to your remote personal fork of the repository. +- Submit a pull request (PR) to [Volcano](https://github.com/volcano-sh/volcano). The PR must receive approval from at least two community maintainers before it can be merged. + +### Open a Pull Request + +Volcano follows the standard [GitHub pull request](https://help.github.com/articles/about-pull-requests/) process. + +Volcano bot will apply structured labels to your PRs. + +It also provides suggestions on commands in your PRs to facilitate review. +These `/command` options can be annotated to trigger auto-labeling and notifications. For more information, see [command reference documentation](https://go.k8s.io/bot-commands). + +### Code Review + +To make it easier for your PRs to receive reviews, + +* Follow [good coding guidelines](https://github.com/golang/go/wiki/CodeReviewComments). +* Write [good commit messages](https://chris.beams.io/posts/git-commit/). +* Break down large chunks of modification into smaller unites that are logically independent and easy to understand. +* Label your PRs properly so that they can be sent to appropriate reviewers. The bot will help you through the entire PR submission process. + + + +### Commit Message Format + +In the subject line mention the changes you have made, and in the message body provide the reasons for making these changes. + +```shell +scripts: add test code for metamanager + +Unit test code is added to improve code coverage for metamanager. + +Fixes #12 +``` + +A more formal format is as follows: + +```shell +: + + + +