Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
27 changes: 22 additions & 5 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,23 @@
.idea/*
resources*
public
.vscode/*
# Dependencies
/node_modules

# Production
/build

# Generated files
.docusaurus
.cache-loader

# Misc
.DS_Store
.hugo_build.lock
.env.local
.env.development.local
.env.test.local
.env.production.local

npm-debug.log*
yarn-debug.log*
yarn-error.log*

# Legacy backups
hugo_backup/
44 changes: 16 additions & 28 deletions content/en/blog/1.4 release-en.md → blog/1.4-release-en.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,16 @@
+++
title = "Volcano v1.4 (Beta) Release Note"
description = "Volcano v1.4 (Beta) Release Includes New Features Such as NUMA-Aware"
subtitle = ""

date = 2021-08-31
lastmod = 2021-09-13
datemonth = "Sep"
dateyear = "2021"
dateday = 13

draft = false # Is this a draft? true/false
toc = true # Show table of contents? true/false
type = "posts" # Do not modify.
authors = ["Thor-wl"]

tags = ["Tutorials"]
summary = "Volcano v1.4 (Beta) Release Includes New Features Such as NUMA-Aware"

# Add menu entry to sidebar.
linktitle = "Volcano v1.4 (Beta) Release Note"
[menu.posts]
parent = "tutorials"
weight = 12
+++
---
title: "Volcano v1.4 (Beta) Release Note"
description: "Volcano v1.4 (Beta) Release Includes New Features Such as NUMA-Aware"
subtitle: ""
date: 2021-08-31
draft: false
authors:
- Thor-wl
tags:
- Tutorials
summary: "Volcano v1.4 (Beta) Release Includes New Features Such as NUMA-Aware"
---
---

>This article was firstly released at `Container Cube` on September 6th, 2021, refer to[Volcano v1.4.0-Beta发布,支持NUMA-Aware等多个重要特性](https://mp.weixin.qq.com/s/S5JAQI0uLoTEx0lvYDXM4Q)

Expand All @@ -36,16 +24,16 @@ Now with resource ratio-based partitions, you can set a dominant resource (usual

In this way, GPU-consuming jobs that meet the ratio requirement can be scheduled to the node at any time, preventing GPU wastes. Compared with other solutions in the industry, this more flexible method improves node resource utilization.

For details about the feature design and usage, you can visit https://github.com/volcano-sh/volcano/blob/master/docs/design/proportional.md.
For details about the feature design and usage, you can visit [/docs/queue_resource_management](https://volcano.sh/docs/queue_resource_management).


__CPU NUMA-aware__ is another important feature of this version. For computing-intensive jobs such as AI and big data jobs, enabling NUMA will significantly improve the computing efficiency. With CPU NUMA-aware scheduling, you can configure the NUMA policy to determine whether to enable NUMA for workloads. The scheduler will select a node that meets the NUMA requirements.

For details about the feature design and usage, you can visit https://github.com/volcano-sh/volcano/blob/master/docs/design/numa-aware.md.
For details about the feature design and usage, you can visit [/docs/plugins/numa-aware](https://volcano.sh/docs/plugins/numa-aware).

You can now __deploy different types of schedulers__ in a Kubernetes cluster to properly schedule resources. The most common use case is deploying default-scheduler and Volcano together. Native Kubernetes resource objects, such as Deployments and StatefulSets, can be scheduled by default-scheduler, and high-performance computing workloads, such as Volcano Jobs, TensorFlow Jobs, and Spark Jobs, can be scheduled by Volcano. This solution can make the best possible use of each type of schedulers and reduce the concurrency pressure of a single scheduler.

For details about the feature design and usage, you can visit https://github.com/volcano-sh/volcano/blob/master/docs/design/multi-scheduler.md.
For details about the feature design and usage, you can visit [/docs/multi_cluster_scheduling](https://volcano.sh/docs/multi_cluster_scheduling).

In addition to the preceding features, Volcano v1.4 (Beta) adds the stress testing automation framework and fixes bugs introduced by the resource comparison function robustness.

Expand Down
63 changes: 25 additions & 38 deletions content/en/blog/ING_case-en.md → blog/ING_case-en.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,17 @@
+++
title = "ING Bank: How Volcano Empowers Their Big Data Analytics Platform"
description = "ING Bank: How Volcano Empowers Their Big Data Analytics Platform"
subtitle = ""

date = 2022-12-28
lastmod = 2022-12-28
datemonth = "Dec"
dateyear = "2022"
dateday = 28

draft = false # Is this a draft? true/false
toc = true # Show table of contents? true/false
type = "posts" # Do not modify.
authors = ["volcano"]

tags = ["Practice"]
summary = "ING Bank: How Volcano Empowers Their Big Data Analytics Platform"

# Add menu entry to sidebar.
linktitle = "ING Bank: How Volcano Empowers Their Big Data Analytics Platform"
[menu.posts]
parent = "tutorials"
weight = 6
+++

>2On October 26, 2022, Krzysztof Adamski and Tinco Boekestijn from ING Group delivered a keynote speech "Efficient Scheduling Of High Performance Batch Computing For Analytics Workloads With Volcano" at KubeCon North America. The speech focused on how Volcano, a cloud native batch computing project, supports high-performance scheduling for big data analytics jobs on ING's data management platform.
---
title: "ING Bank: How Volcano Empowers Their Big Data Analytics Platform"
description: "ING Bank: How Volcano Empowers Their Big Data Analytics Platform"
subtitle: ""
date: 2022-12-28
draft: false
authors:
- volcano
tags:
- Practice
summary: "ING Bank: How Volcano Empowers Their Big Data Analytics Platform"
---

>On October 26, 2022, Krzysztof Adamski and Tinco Boekestijn from ING Group delivered a keynote speech "Efficient Scheduling Of High Performance Batch Computing For Analytics Workloads With Volcano" at KubeCon North America. The speech focused on how Volcano, a cloud native batch computing project, supports high-performance scheduling for big data analytics jobs on ING's data management platform.
More details: [KubeCon + CloudNativeCon North America](https://events.linuxfoundation.org/archive/2022/kubecon-cloudnativecon-north-america/program/schedule/)

## Introduction to ING
Expand All @@ -38,7 +25,7 @@ ING provides services in more than 40 countries around the world. Core businesse

Regulations and restrictions on banking vary depending on the country/region. Data silos, data security, and compliance requirements can be really challenging. It is not easy to introduce new technologies. Therefore, ING builds their Data Analytics Platform (DAP) to provide secure, self-service functionality for employees to manage services throughout the entire process.

{{<figure library="1" src="ing-1.png">}}
![](/img/ing-1.png)

In 2013, they conceptualized data platform. In 2018, ING introduced cloud native technologies to upgrade their infrastructure platform. Since then, more and more employees and departments turn to the platform, and by now, there are more than 400 projects on the data index platform.

Expand All @@ -51,7 +38,7 @@ They aim to meet all analytics needs in a highly secure, self-service platform t


## Challenges and Solutions
{{<figure library="1" src="ing-2.png">}}
![](/img/ing-2.png)

ING is shifting from Hadoop to Kubernetes. They met some challenges in job management and multi-framework support. For example:

Expand All @@ -70,20 +57,20 @@ ING is shifting from Hadoop to Kubernetes. They met some challenges in job manag
Managing applications (stateless and even stateful ones) with Kubernetes would be a perfect choice, if Kubernetes is as user-friendly as Yarn in the scheduling and management of batch computing jobs. Yarn also provides limited support, for example, on TensorFlow and PyTorch. Therefore, ING looked for better solutions.

__Kubernetes + Hadoop__
{{<figure library="1" src="ing-3.png">}}
![](/img/ing-3.png)
When managing clusters, ING once separated Hadoop and Kubernetes. They ran almost all Spark jobs in Hadoop clusters, and other tasks and algorithms in Kubernetes clusters. They want to run all the jobs in Kubernetes clusters to simplify management.

{{<figure library="1" src="ing-4.png">}}
![](/img/ing-4.png)
When Kubernetes and Yarn work together, Kubernetes and Hadoop resources are statically divided. During office hours, Hadoop applications and Kubernetes use their own resources. Spark tasks, when heavily pressured, cannot be allocated extra resources. At night, there are only batch processing tasks in clusters. All Kubernetes resources are idle but cannot be allocated to Hadoop. In this case, resources are not fully used.


__Kubernetes with Volcano__
{{<figure library="1" src="ing-5.png">}}
![](/img/ing-5.png)
When managing clusters with Kubernetes and scheduling Spark tasks with Volcano, resources do not need to be statically divided. Cluster resources can be dynamically re-allocated based on the priorities and resource pressure of pods, batch tasks, and interactive tasks, which greatly improves the overall utilization of cluster resources.

For example, during office hours, idle resources of common service applications can be used by batch and interactive applications temporarily. In holidays or nights, batch applications can use all cluster resources for data computing.

{{<figure library="1" src="ing-6.png">}}
![](/img/ing-6.png)
Volcano is a batch scheduling engine developed for Kubernetes with the following capabilities:

- Job queues with weighted priority
Expand All @@ -96,10 +83,10 @@ Volcano supplements Kubernetes in batch scheduling. Since Apache Spark 3.3, Volc

## Highlighted Features
__Redundancy and Local Affinity__
{{<figure library="1" src="ing-7.png">}}
![](/img/ing-7.png)
Volcano retains the affinity and anti-affinity policies for pods in Kubernetes, and adds those for tasks.

{{<figure library="1" src="ing-8.png">}}
![](/img/ing-8.png)
The idea of DRF is that in a multi-resource environment, resource allocation should be determined by the dominant share of an entity (user or queue). The volcano-scheduler observes the dominant resource requested by each job and uses it as a measure of cluster resource usage. Based on this dominant resource, the volcano-scheduler calculates the share of the job. The job with a lower share has a higher scheduling priority.

For example, a cluster has 18 CPUs and 72 GB memory in total. User1 and User2 are each allocated one queue. Any submitted job will get its scheduling priority based on the dominant resource.
Expand All @@ -112,15 +99,15 @@ Under a DRF policy, the job with a lower share will be first scheduled, that is,
Queue resources in a cluster can be divided by configuring weights. However, overcommitted tasks in a queue can use the idle resources in other queues. In this example, after using up the CPUs of its own queue, User2 can use the idle CPUs of User1. When User1 commits a new task, it triggers resource preemption and reclaims the resources occupied by other queues.

__Resource Reservation__
{{<figure library="1" src="ing-9.png">}}
![](/img/ing-9.png)
Batch computing tasks and other services may preempt resources and cause conflicts. Assume there are two available nodes in a cluster and we need to deploy a unified service layer in the cluster to provide services externally, such as Presto or cache services like Alluxio, batch computing tasks may have already taken all resources and we can't deploy or upgrade that service layer. Therefore, ING's platform now allows users to reserve some resources for other services.

__DRF Dashboard__
{{<figure library="1" src="ing-10.png">}}
![](/img/ing-10.png)
ING built a DRF scheduling dashboard based on the monitoring data from Volcano to obtain scheduling data at different layers. In the service cluster, ING stores the tasks of interactive users in one queue, and the computing tasks of all key projects running on the data platform in another queue. ING can take certain resources from other queues to the key project queue, but that won't do any good to the tasks of interactive users.

ING is considering displaying the peak hours of cluster use to provide users with more information. With this, users can decide when to start their tasks based on the cluster resource readiness, improving computing performance without complex configurations in the background.
{{<figure library="1" src="ing-11.png">}}
![](/img/ing-11.png)

## Summary
Volcano abstracts batch task scheduling, allowing Kubernetes to better serve ING in task scheduling. ING will contribute their developed functions to the community, such as the DRF dashboard, idle resource reservation on each node, auto queue management, new Prometheus monitoring metrics, Grafana dashboard updates, kube-state-metrics update, and cluster role restrictions.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,29 +1,15 @@
+++
title = "Quick Start Guide for Volcano"
description = "Bring up the Volcano in any K8s Cluster within few mins"
subtitle =""

date = 2019-03-28
lastmod = 2019-03-29
datemonth = "Mar"
dateyear = "2019"
dateday = 28

draft = false # Is this a draft? true/false
toc = true # Show table of contents? true/false
type = "posts" # Do not modify.
authors = ["Volcano"]
authors_img = "/img/icon_user.svg"

tags = ["Tutorials"]
summary = "Quick Start Guide for Volcano"

# Add menu entry to sidebar.
linktitle = "Quick Start Guide for Volcano"
[menu.posts]
parent = "tutorials"
weight = 1
+++
---
title: "Quick Start Guide for Volcano"
description: "Bring up the Volcano in any K8s Cluster within few mins"
subtitle: ""
date: 2019-03-28
draft: false
authors:
- Volcano
tags:
- Tutorials
summary: "Quick Start Guide for Volcano"
---
# Quick Start Guide
The easiest way to deploy Volcano is using Helm charts.
### Preparation
Expand All @@ -45,7 +31,7 @@ volcanosh/vk-admission latest a83338506638 8 seconds ago
volcanosh/vk-scheduler latest faa3c2a25ac3 9 seconds ago 49.6MB
volcanosh/vk-controllers latest 7b11606ebfb8 10 seconds ago 44.2MB
```
**NOTE**: Ensure that the images are correctly loaded to your Kubernetes cluster. For example, if you are using [kind luster](https://github.com/kubernetes-sigs/kind), run the ```kind load docker-image <image-name>:<tag> ``` command for each image.
**NOTE**: Ensure that the images are correctly loaded to your Kubernetes cluster. For example, if you are using [kind cluster](https://github.com/kubernetes-sigs/kind), run the ```kind load docker-image <image-name>:<tag> ``` command for each image.
### 2. Helm Charts

Install Helm charts.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,28 +1,34 @@
+++
title = "Volcano v1.10.0 Available Now"
description = "New features: Support Queue Priority Scheduling Strategy, Enable Fine-Grained GPU Resource Sharing and Reclaim, Introduce Pod Scheduling Readiness Support, Add Sidecar Container Scheduling Capabilities, Enhance Vcctl Command Line Tool, Ensure Compatibility with Kubernetes v1.30, Strengthen Volcano Security Measures, Optimize Volcano for Large-Scale Performance, Improve GPU Monitoring Function, Optimize Helm Chart Installation And Upgrade Processes, etc."
subtitle = ""

date = 2024-09-29
lastmod = 2024-09-29
datemonth = "Sep"
dateyear = "2024"
dateday = 29

draft = false # Is this a draft? true/false
toc = true # Show table of contents? true/false
type = "posts" # Do not modify.
authors = ["volcano"]

tags = ["Practice"]
summary = "New features: Support Queue Priority Scheduling Strategy, Enable Fine-Grained GPU Resource Sharing and Reclaim, Introduce Pod Scheduling Readiness Support, Add Sidecar Container Scheduling Capabilities, Enhance Vcctl Command Line Tool, Ensure Compatibility with Kubernetes v1.30, Strengthen Volcano Security Measures, Optimize Volcano for Large-Scale Performance, Improve GPU Monitoring Function, Optimize Helm Chart Installation And Upgrade Processes, etc."

# Add menu entry to sidebar.
linktitle = "Volcano v1.10.0 Available Now"
[menu.posts]
parent = "tutorials"
weight = 6
+++
---
title: Volcano v1.10.0 Available Now
description: 'New features: Support Queue Priority Scheduling Strategy, Enable Fine-Grained
GPU Resource Sharing and Reclaim, Introduce Pod Scheduling Readiness Support, Add
Sidecar Container Scheduling Capabilities, Enhance Vcctl Command Line Tool, Ensure
Compatibility with Kubernetes v1.30, Strengthen Volcano Security Measures, Optimize
Volcano for Large-Scale Performance, Improve GPU Monitoring Function, Optimize Helm
Chart Installation And Upgrade Processes, etc.'
subtitle: ''
date: 2024-09-29
lastmod: '2024-09-29'
datemonth: Sep
dateyear: '2024'
dateday: '29'
draft: false
toc: true
type: posts
authors:
- volcano
tags:
- Practice
summary: 'New features: Support Queue Priority Scheduling Strategy, Enable Fine-Grained
GPU Resource Sharing and Reclaim, Introduce Pod Scheduling Readiness Support, Add
Sidecar Container Scheduling Capabilities, Enhance Vcctl Command Line Tool, Ensure
Compatibility with Kubernetes v1.30, Strengthen Volcano Security Measures, Optimize
Volcano for Large-Scale Performance, Improve GPU Monitoring Function, Optimize Helm
Chart Installation And Upgrade Processes, etc.'
linktitle: Volcano v1.10.0 Available Now
parent: tutorials
weight: '6'
---


On Sep 19, 2024, UTC+8, Volcano version v1.10.0 was officially released. This version introduced the following new features:
Expand All @@ -47,7 +53,7 @@ On Sep 19, 2024, UTC+8, Volcano version v1.10.0 was officially released. This ve

- **Optimize Helm Chart Installation And Upgrade Processes**

{{<figure library="1" src="volcano_logo.png" width="50%">}}
![](/img/volcano_logo.png)
Volcano is the industry-first cloud native batch computing project. Open-sourced at KubeCon Shanghai in June 2019, it became an official CNCF project in April 2020. In April 2022, Volcano was promoted to a CNCF incubating project. By now, more than 600 global developers have committed code to the project. The community is seeing growing popularity among developers, partners, and users.

## Key Features
Expand Down Expand Up @@ -94,7 +100,7 @@ spec:
reclaimable: true
deserved: # set the deserved field.
cpu: 64
memeory: 128Gi
memory: 128Gi
nvidia.com/a100: 40
nvidia.com/v100: 80
```
Expand All @@ -115,11 +121,11 @@ For more details on the proportion plugin, please visit: [Proportion Plugin](htt

### Introduce Pod Scheduling Readiness Support

Once a Pod is created, it is considered ready for scheduling. In Kube-scheduler, it will try its best to find a suitable node to place all pending Pods. However, in reality, some Pods may be in a "lack of necessary resources" state for a long time. These Pods actually interfere with the decision-making and operation of the scheduler (and downstream components such as Cluster AutoScaler) in an unnecessary way, causing problems such as resource waste. Pod Scheduling Readiness is a new feature of Kube-sheduler. In Kubernetes v.1.30 GA, it has become a stable feature. It controls the scheduling timing of Pods by setting the schedulingGates field of the Pod.
Once a Pod is created, it is considered ready for scheduling. In Kube-scheduler, it will try its best to find a suitable node to place all pending Pods. However, in reality, some Pods may be in a "lack of necessary resources" state for a long time. These Pods actually interfere with the decision-making and operation of the scheduler (and downstream components such as Cluster AutoScaler) in an unnecessary way, causing problems such as resource waste. Pod Scheduling Readiness is a new feature of Kube-scheduler. In Kubernetes v.1.30 GA, it has become a stable feature. It controls the scheduling timing of Pods by setting the schedulingGates field of the Pod.

<div style="text-align: center;"> {{<figure library="1" src="./v1.10.0/podSchedulingGates.svg">}}
![](/img/v1.10.0/podSchedulingGates.svg)
Pod SchedulingGates
</div>


In previous versions, Volcano has integrated all algorithms of the K8s default scheduler, fully covering the native scheduling functions of Kube-scheduler. Therefore, Volcano can completely replace Kube-scheduler as a unified scheduler under the cloud native platform, supporting unified scheduling of microservices and AI/big data workloads. In the latest version v1.10, Volcano has introduced Pod Scheduling Readiness scheduling capability to further meet users' scheduling needs in diverse scenarios.

Expand Down
Loading