Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ date = 2024-03-08
lastmod = 2024-03-08
datemonth = "Mar"
dateyear = "2024"
dateday = 08
dateday = "08"

draft = false # Is this a draft? true/false
toc = true # Show table of contents? true/false
Expand Down
2 changes: 1 addition & 1 deletion content/en/blog/Volcano-1.11.0-release.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ date = 2025-02-07
lastmod = 2025-02-07
datemonth = "Feb"
dateyear = "2025"
dateday = 07
dateday = "07"

draft = false # Is this a draft? true/false
toc = true # Show table of contents? true/false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ date = 2025-04-01
lastmod = 2025-04-01
datemonth = "Apr"
dateyear = "2025"
dateday = 01
dateday = "01"

draft = false # Is this a draft? true/false
toc = true # Show table of contents? true/false
Expand Down
105 changes: 105 additions & 0 deletions content/en/docs/binpack.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
+++
title = "Binpack"

date = 2021-05-13
lastmod = 2025-11-11

draft = false # Is this a draft? true/false
toc = true # Show table of contents? true/false
type = "docs" # Do not modify.

# Add menu entry to sidebar.
linktitle = "Binpack"
[menu.docs]
parent = "plugins"
weight = 5
+++

## Overview

The goal of the Binpack scheduling algorithm is to fill existing nodes as much as possible (trying not to allocate to empty nodes). In the concrete implementation, the Binpack scheduling algorithm scores the nodes that can accommodate the task, with higher scores indicating higher resource utilization rates. The Binpack algorithm can fill up nodes as much as possible, consolidating application loads on some nodes, which is very conducive to the Kubernetes cluster's node auto-scaling functionality.

## How It Works

The Binpack algorithm is injected into the Volcano Scheduler process as a plugin and is applied during the node selection stage for Pods. When calculating the Binpack score, the Volcano Scheduler considers various resources requested by the Pod and averages them according to the weights configured for each resource.

Key characteristics:

- **Resource Weight**: Each resource type (CPU, Memory, GPU, etc.) can have a different weight in the scoring calculation, depending on the weight value configured by the administrator.
- **Plugin Weight**: Different plugins also need to be assigned different weights when calculating node scores. The scheduler also sets score weights for the Binpack plugin.
- **NodeOrderFn**: The plugin implements the NodeOrderFn to score nodes based on how efficiently they would be utilized after placing the task.

## Scenario

The Binpack algorithm is beneficial for small jobs that can fill as many nodes as possible:

### Big Data Scenarios

Single query jobs in big data processing benefit from Binpack by consolidating workloads and maximizing resource utilization on active nodes.

### E-commerce High Concurrency

Order generation in e-commerce flash sale scenarios can leverage Binpack to efficiently use available resources during peak loads.

### AI Inference

Single identification jobs in AI inference scenarios benefit from consolidated scheduling, reducing resource fragmentation.

### Internet Services

High concurrency service scenarios on the Internet benefit from Binpack by reducing fragmentation within nodes and reserving sufficient resource space on idle machines for Pods that have applied for more resource requests, maximizing the utilization of idle resources in the cluster.

## Configuration

The Binpack plugin is configured in the scheduler ConfigMap with optional weight parameters:

```yaml
tiers:
- plugins:
- name: binpack
arguments:
binpack.weight: 10
binpack.cpu: 1
binpack.memory: 1
binpack.resources: nvidia.com/gpu
binpack.resources.nvidia.com/gpu: 2
```

### Configuration Parameters

| Parameter | Description | Default |
|-----------|-------------|---------|
| `binpack.weight` | Overall weight of the Binpack plugin score | 1 |
| `binpack.cpu` | Weight for CPU resource in scoring | 1 |
| `binpack.memory` | Weight for Memory resource in scoring | 1 |
| `binpack.resources` | Additional resources to consider | - |
| `binpack.resources.<resource>` | Weight for specific resource type | 1 |

## Example

Here's an example scheduler configuration that uses Binpack to prioritize node filling:

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: volcano-scheduler-configmap
namespace: volcano-system
data:
volcano-scheduler.conf: |
actions: "enqueue, allocate, backfill"
tiers:
- plugins:
- name: priority
- name: gang
- plugins:
- name: predicates
- name: nodeorder
- name: binpack
arguments:
binpack.weight: 10
binpack.cpu: 2
binpack.memory: 1
```

In this configuration, the Binpack plugin is given a weight of 10, and CPU is weighted twice as much as memory in the scoring calculation.
119 changes: 119 additions & 0 deletions content/en/docs/drf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
+++
title = "DRF"

date = 2021-05-13
lastmod = 2025-11-11

draft = false # Is this a draft? true/false
toc = true # Show table of contents? true/false
type = "docs" # Do not modify.

# Add menu entry to sidebar.
linktitle = "DRF"
[menu.docs]
parent = "plugins"
weight = 7
+++

{{<figure library="1" src="drfjob.png" title="DRF Plugin">}}

## Overview

The full name of the DRF scheduling algorithm is **Dominant Resource Fairness**, which is a scheduling algorithm based on the container group's Dominant Resource. The Dominant Resource is the largest percentage of all required resources for a container group relative to the total cluster resources.

The DRF algorithm selects the container group with the smallest Dominant Resource share for priority scheduling. This approach can accommodate more jobs without allowing a single resource-heavy job to starve a large number of smaller jobs. The DRF scheduling algorithm ensures that in an environment where many types of resources coexist, the fair allocation principle is satisfied as much as possible.

## How It Works

The DRF plugin:

1. **Observes Dominant Resource**: For each job, it identifies which resource (CPU, Memory, GPU, etc.) represents the largest share of cluster resources
2. **Calculates Share Value**: Computes each job's share value based on its dominant resource usage
3. **Prioritizes Lower Share**: Jobs with lower share values (using less of their dominant resource) get higher scheduling priority

Key functions implemented:

- **JobOrderFn**: Orders jobs based on their dominant resource share, giving priority to jobs with smaller shares
- **PreemptableFn**: Determines if a job can be preempted based on resource fairness calculations

The plugin attempts to calculate the total amount of resources allocated to the preemptor and preempted tasks, triggering preemption when the preemptor task has fewer resources.

## Scenario

The DRF scheduling algorithm gives priority to the throughput of businesses in the cluster and is suitable for batch processing scenarios:

### AI Training

Single AI training jobs benefit from DRF as it ensures fair resource allocation across multiple training workloads.

### Big Data Processing

Single big data calculation and query jobs can share resources fairly with other workloads in the cluster.

### Mixed Resource Workloads

In environments with diverse resource requirements (CPU-intensive, Memory-intensive, GPU-intensive jobs), DRF ensures fair allocation across all resource dimensions.

## Configuration

The DRF plugin is configured in the scheduler ConfigMap:

```yaml
tiers:
- plugins:
- name: priority
- name: gang
- plugins:
- name: drf
- name: predicates
- name: proportion
```

## Example

Consider a cluster with the following resources:
- 100 CPUs
- 400 GB Memory

And two jobs:
- **Job A**: Each task requires 2 CPUs and 8 GB Memory
- **Job B**: Each task requires 1 CPU and 32 GB Memory

For Job A:
- CPU share per task: 2/100 = 2%
- Memory share per task: 8/400 = 2%
- Dominant resource: CPU and Memory are equal (2%)

For Job B:
- CPU share per task: 1/100 = 1%
- Memory share per task: 32/400 = 8%
- Dominant resource: Memory (8%)

With DRF, Job A would be scheduled first because its dominant resource share (2%) is smaller than Job B's (8%). This ensures that neither job can monopolize the cluster by requesting large amounts of a single resource.

### VolcanoJob Example

```yaml
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: drf-example-job
spec:
schedulerName: volcano
minAvailable: 2
tasks:
- replicas: 2
name: worker
template:
spec:
containers:
- name: worker
image: busybox
resources:
requests:
cpu: "2"
memory: "8Gi"
limits:
cpu: "2"
memory: "8Gi"
```
91 changes: 91 additions & 0 deletions content/en/docs/gang.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
+++
title = "Gang"

date = 2021-05-13
lastmod = 2025-11-11

draft = false # Is this a draft? true/false
toc = true # Show table of contents? true/false
type = "docs" # Do not modify.

# Add menu entry to sidebar.
linktitle = "Gang"
[menu.docs]
parent = "plugins"
weight = 4
+++

{{<figure library="1" src="gang.png" title="Gang Plugin">}}

## Overview

The Gang scheduling strategy is one of the core scheduling algorithms of the Volcano Scheduler. It meets the scheduling requirements of "All or nothing" in the scheduling process and avoids the waste of cluster resources caused by arbitrary scheduling of Pods. The Gang scheduler algorithm observes whether the scheduled number of Pods under a Job meets the minimum number of runs. When the minimum number of runs of Job is satisfied, the scheduling action is executed for all Pods under the Job; otherwise, it is not executed.

## How It Works

The Gang plugin considers tasks not in the `Ready` state (including Binding, Bound, Running, Allocated, Succeed, and Pipelined) as having a higher priority. It checks whether the resources allocated to the queue can meet the resources required by the task to run `minAvailable` pods after trying to evict some pods and reclaim resources. If yes, the Gang plugin will proceed with scheduling.

Key functions implemented by the Gang plugin:

- **JobReadyFn**: Checks if a job has enough resources to meet its `minAvailable` requirement
- **JobPipelinedFn**: Checks if a job can be pipelined
- **JobValidFn**: Validates if a job's Gang constraint is satisfied

## Scenario

The Gang scheduling algorithm based on the container group concept is well suited for scenarios that require multi-process collaboration:

### AI and Deep Learning

AI scenes often contain complex processes including Data Ingestion, Data Analysts, Data Splitting, Trainers, Serving, and Logging. These require a group of containers to work together, making them suitable for the container-based Gang scheduling strategy.

### MPI and HPC

Multi-thread parallel computing communication scenarios under the MPI computing framework are also suitable for Gang scheduling because master and slave processes need to work together. Containers under the container group are highly correlated, and there may be resource contention. Overall scheduling allocation can effectively solve deadlock situations.

### Resource Efficiency

In the case of insufficient cluster resources, the Gang scheduling strategy can significantly improve the utilization of cluster resources by preventing partial job allocations that would waste resources waiting for other tasks.

## Configuration

The Gang plugin is typically enabled by default and configured in the scheduler ConfigMap:

```yaml
tiers:
- plugins:
- name: priority
- name: gang
- name: conformance
```

## Example

Here's an example of a VolcanoJob that uses Gang scheduling:

```yaml
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: tensorflow-job
spec:
minAvailable: 3 # Gang constraint: at least 3 pods must be schedulable
schedulerName: volcano
tasks:
- replicas: 1
name: ps
template:
spec:
containers:
- name: tensorflow
image: tensorflow/tensorflow:latest
- replicas: 2
name: worker
template:
spec:
containers:
- name: tensorflow
image: tensorflow/tensorflow:latest
```

In this example, the job will only be scheduled if all 3 pods (1 ps + 2 workers) can be allocated resources simultaneously.
Loading