From ae027f330f37c393526e6a8bf274b0c441da03a5 Mon Sep 17 00:00:00 2001 From: Wei-Chiu Chuang Date: Fri, 15 Aug 2025 01:21:56 -0700 Subject: [PATCH 1/2] docs: Correct EC pipeline calculation and limits Discovered and corrected the documentation for how the number of EC pipelines is calculated. The previous analysis was incorrect. - `ErasureCoding.md` is updated to describe the two new properties `ozone.scm.ec.pipeline.minimum` and `ozone.scm.ec.pipeline.per.volume.factor` and the `max()` logic used to determine the target number of pipelines. - `ProductionDeployment.md` is updated to reference the correct and existing configuration property for tuning EC pipelines. Change-Id: I393dc60d8745da2b2bb7899530665a108956446d --- .../docs/content/feature/ErasureCoding.md | 35 +++++++++++++++ .../content/feature/multi-raft-support.md | 45 +++++++++++++++++++ .../content/start/ProductionDeployment.md | 2 +- 3 files changed, 81 insertions(+), 1 deletion(-) diff --git a/hadoop-hdds/docs/content/feature/ErasureCoding.md b/hadoop-hdds/docs/content/feature/ErasureCoding.md index 9e60c3a923a2..b7899010a8c8 100644 --- a/hadoop-hdds/docs/content/feature/ErasureCoding.md +++ b/hadoop-hdds/docs/content/feature/ErasureCoding.md @@ -228,6 +228,41 @@ When using ofs/o3fs, we can pass the EC Replication Config by setting the config In the case bucket already has default EC Replication Config, there is no need of passing EC Replication Config while creating key. +#### Calculating EC Pipeline Limits + +The target number of open EC pipelines SCM aims to maintain is calculated dynamically for each EC replication configuration (e.g., RS-6-3, RS-3-2). The calculation is based on the following two properties, with the final target being the greater of the two resulting values. + +* `ozone.scm.ec.pipeline.minimum` + * **Description**: The guaranteed minimum number of open pipelines to maintain for each EC configuration, regardless of other factors. + * **Default Value**: `5` + +* `ozone.scm.ec.pipeline.per.volume.factor` + * **Description**: A factor used to calculate a target number of pipelines based on the total number of healthy volumes across all datanodes in the cluster. + * **Default Value**: `1.0` + +**Calculation Logic:** + +SCM first calculates a volume-based target using the formula: +`( * ) / ` + +The final target number of pipelines is then determined by: +`max(, )` + +**Example:** + +Consider a cluster with **200 total healthy volumes** across all datanodes and an EC policy of **RS-6-3** (which requires 9 nodes). +* `ozone.scm.ec.pipeline.minimum` = **5** (default) +* `ozone.scm.ec.pipeline.per.volume.factor` = **1.0** (default) + +1. The volume-based target is: `(1.0 * 200) / 9 = 22` +2. The final target is: `max(22, 5) = 22` + +SCM will attempt to create and maintain approximately **22** open, RS-6-3 EC pipelines. + +**Production Recommendation:** + +The default values are a good starting point for most clusters. If you have a very high number of volumes and a write-heavy EC workload, you might consider slightly increasing the `pipeline.per.volume.factor`. Conversely, for read-heavy workloads, the default minimum of 5 pipelines is often sufficient. + ### Enable Intel ISA-L Intel Intelligent Storage Acceleration Library (ISA-L) is an open-source collection of optimized low-level functions used for diff --git a/hadoop-hdds/docs/content/feature/multi-raft-support.md b/hadoop-hdds/docs/content/feature/multi-raft-support.md index c0cd1b4c0433..3f8d6858cca8 100644 --- a/hadoop-hdds/docs/content/feature/multi-raft-support.md +++ b/hadoop-hdds/docs/content/feature/multi-raft-support.md @@ -69,6 +69,51 @@ Ratis handles concurrent logs per node. This property is effective only when the previous property is set to 0. The value of this property must be greater than 0. +### Calculating Ratis Pipeline Limits + +The target number of open, FACTOR_THREE Ratis pipelines is controlled by three properties that define the maximum number of pipelines in the cluster at a cluster-wide level, datanode level, and metadata disk level, respectively. SCM will create pipelines until the most restrictive limit is met. + +1. **Cluster-wide Limit (`ozone.scm.ratis.pipeline.limit`)** + * **Description**: An absolute, global limit for the total number of open, FACTOR_THREE Ratis pipelines across the entire cluster. This acts as a final cap on the total number of pipelines. + * **Default Value**: `0` (which means no global limit is enforced by default). + +2. **Datanode-level Fixed Limit (`ozone.scm.datanode.pipeline.limit`)** + * **Description**: When set to a positive number, this property defines a fixed maximum number of pipelines for every datanode. This is one of two ways to calculate a cluster-wide target. + * **Default Value**: `2` + * **Calculation**: If this is set, the target is `( * ) / 3`. + +3. **Datanode-level Dynamic Limit (`ozone.scm.pipeline.per.metadata.disk`)** + * **Description**: This property is used only when `ozone.scm.datanode.pipeline.limit` is explicitly set to `0`. It calculates a dynamic limit for each datanode based on its available metadata disks. + * **Default Value**: `2` + * **Calculation**: The limit for each datanode is `( * )`. The total cluster-wide target is the sum of all individual datanode limits, divided by 3. + +#### How Limits are Applied + +SCM first calculates a target number of pipelines based on either the **Datanode-level Fixed Limit** or the **Datanode-level Dynamic Limit**. It then compares this calculated target to the **Cluster-wide Limit**. The **lowest value** is used as the final target for the number of open pipelines. + +**Example (Dynamic Limit):** + +Consider a cluster with **10 healthy datanodes**. +* **8 datanodes** have 4 metadata disks each. +* **2 datanodes** have 2 metadata disks each. + +And the configuration is: +* `ozone.scm.ratis.pipeline.limit` = **30** (A global cap is set) +* `ozone.scm.datanode.pipeline.limit` = **0** (Use dynamic calculation) +* `ozone.scm.pipeline.per.metadata.disk` = **2** (Default) + +**Calculation Steps:** +1. Calculate the limit for the first group of datanodes: `8 datanodes * (2 pipelines/disk * 4 disks/datanode) = 64 pipelines` +2. Calculate the limit for the second group of datanodes: `2 datanodes * (2 pipelines/disk * 2 disks/datanode) = 8 pipelines` +3. Calculate the total raw target from the dynamic limit: `(64 + 8) / 3 = 24` +4. Compare with the global limit: `min(24, 30) = 24` + +SCM will attempt to create and maintain approximately **24** open, FACTOR_THREE Ratis pipelines. + +**Production Recommendation:** + +For most production deployments, using the dynamic per-disk limit (`ozone.scm.datanode.pipeline.limit=0`) is recommended, as it allows the cluster to scale pipeline capacity naturally with its resources. You can use the global limit (`ozone.scm.ratis.pipeline.limit`) as a safety cap if needed. A good starting value for `ozone.scm.pipeline.per.metadata.disk` is **2**. Monitor the `NumOpenPipelines` metric in SCM to see if the actual number of pipelines aligns with your configured targets. + ## How to Use 1. Configure Datanode metadata directories: ```xml diff --git a/hadoop-hdds/docs/content/start/ProductionDeployment.md b/hadoop-hdds/docs/content/start/ProductionDeployment.md index ed24a7b26710..e3c16b060e6b 100644 --- a/hadoop-hdds/docs/content/start/ProductionDeployment.md +++ b/hadoop-hdds/docs/content/start/ProductionDeployment.md @@ -85,5 +85,5 @@ A typical production Ozone cluster includes the following services: ### Ozone Configuration * **Monitoring**: Install Prometheus and Grafana for monitoring the Ozone cluster. For audit logs, consider using a log ingestion framework such as the ELK Stack (Elasticsearch, Logstash, and Kibana) with FileBeat, or other similar frameworks. Alternatively, you can use Apache Ranger to manage audit logs. -* **Pipeline Limits**: Increase the number of allowed write pipelines to better suit your workload by adjusting `ozone.scm.datanode.pipeline.limit` and `ozone.scm.ec.pipeline.minimum`. +* **Pipeline Limits**: Increase the number of allowed write pipelines to better suit your workload by adjusting `ozone.scm.datanode.pipeline.limit` (for Ratis) and `ozone.scm.ec.pipeline.minimum` (for EC). * **Heap Sizes**: Configure sufficient heap sizes for Ozone Manager (OM), Storage Container Manager (SCM), Recon, DataNode, S3 Gateway (S3G), and HttpFs services to ensure stability. From fe2f629afc0e40c9be8d3fc5e53e1c348160204d Mon Sep 17 00:00:00 2001 From: Wei-Chiu Chuang Date: Wed, 17 Dec 2025 16:54:27 -0800 Subject: [PATCH 2/2] Remove ratis change; leave just EC content Change-Id: I920ec968502caf3e57473f92f1bfba645cdef50c --- .../content/feature/multi-raft-support.md | 45 ------------------- 1 file changed, 45 deletions(-) diff --git a/hadoop-hdds/docs/content/feature/multi-raft-support.md b/hadoop-hdds/docs/content/feature/multi-raft-support.md index 3f8d6858cca8..c0cd1b4c0433 100644 --- a/hadoop-hdds/docs/content/feature/multi-raft-support.md +++ b/hadoop-hdds/docs/content/feature/multi-raft-support.md @@ -69,51 +69,6 @@ Ratis handles concurrent logs per node. This property is effective only when the previous property is set to 0. The value of this property must be greater than 0. -### Calculating Ratis Pipeline Limits - -The target number of open, FACTOR_THREE Ratis pipelines is controlled by three properties that define the maximum number of pipelines in the cluster at a cluster-wide level, datanode level, and metadata disk level, respectively. SCM will create pipelines until the most restrictive limit is met. - -1. **Cluster-wide Limit (`ozone.scm.ratis.pipeline.limit`)** - * **Description**: An absolute, global limit for the total number of open, FACTOR_THREE Ratis pipelines across the entire cluster. This acts as a final cap on the total number of pipelines. - * **Default Value**: `0` (which means no global limit is enforced by default). - -2. **Datanode-level Fixed Limit (`ozone.scm.datanode.pipeline.limit`)** - * **Description**: When set to a positive number, this property defines a fixed maximum number of pipelines for every datanode. This is one of two ways to calculate a cluster-wide target. - * **Default Value**: `2` - * **Calculation**: If this is set, the target is `( * ) / 3`. - -3. **Datanode-level Dynamic Limit (`ozone.scm.pipeline.per.metadata.disk`)** - * **Description**: This property is used only when `ozone.scm.datanode.pipeline.limit` is explicitly set to `0`. It calculates a dynamic limit for each datanode based on its available metadata disks. - * **Default Value**: `2` - * **Calculation**: The limit for each datanode is `( * )`. The total cluster-wide target is the sum of all individual datanode limits, divided by 3. - -#### How Limits are Applied - -SCM first calculates a target number of pipelines based on either the **Datanode-level Fixed Limit** or the **Datanode-level Dynamic Limit**. It then compares this calculated target to the **Cluster-wide Limit**. The **lowest value** is used as the final target for the number of open pipelines. - -**Example (Dynamic Limit):** - -Consider a cluster with **10 healthy datanodes**. -* **8 datanodes** have 4 metadata disks each. -* **2 datanodes** have 2 metadata disks each. - -And the configuration is: -* `ozone.scm.ratis.pipeline.limit` = **30** (A global cap is set) -* `ozone.scm.datanode.pipeline.limit` = **0** (Use dynamic calculation) -* `ozone.scm.pipeline.per.metadata.disk` = **2** (Default) - -**Calculation Steps:** -1. Calculate the limit for the first group of datanodes: `8 datanodes * (2 pipelines/disk * 4 disks/datanode) = 64 pipelines` -2. Calculate the limit for the second group of datanodes: `2 datanodes * (2 pipelines/disk * 2 disks/datanode) = 8 pipelines` -3. Calculate the total raw target from the dynamic limit: `(64 + 8) / 3 = 24` -4. Compare with the global limit: `min(24, 30) = 24` - -SCM will attempt to create and maintain approximately **24** open, FACTOR_THREE Ratis pipelines. - -**Production Recommendation:** - -For most production deployments, using the dynamic per-disk limit (`ozone.scm.datanode.pipeline.limit=0`) is recommended, as it allows the cluster to scale pipeline capacity naturally with its resources. You can use the global limit (`ozone.scm.ratis.pipeline.limit`) as a safety cap if needed. A good starting value for `ozone.scm.pipeline.per.metadata.disk` is **2**. Monitor the `NumOpenPipelines` metric in SCM to see if the actual number of pipelines aligns with your configured targets. - ## How to Use 1. Configure Datanode metadata directories: ```xml