【News】YOLO-Master-v26.02 Release

<div align="center">
  <img width="320" height="320" alt="YOLO-Master Logo" src="https://github.com/user-attachments/assets/847ce41b-7282-4e98-b8be-240a572dd87a" />

# 🎯 YOLO-Master v2026.02 Release Notes

[![License](https://img.shields.io/badge/License-Tencent-blue.svg)](LICENSE)[![Python](https://img.shields.io/badge/Python-3.8+-blue.svg)](https://www.python.org/)[![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-red.svg)](https://pytorch.org/)[![GitHub stars](https://img.shields.io/github/stars/Tencent/YOLO-Master)](https://github.com/Tencent/YOLO-Master)[![GitHub forks](https://img.shields.io/github/forks/Tencent/YOLO-Master)](https://github.com/Tencent/YOLO-Master/network/members)[![Technical Report](https://img.shields.io/badge/📄%20Technical%20Report-In%20Progress-orange?style=flat-square&logo=arxiv)](https://github.com/Tencent/YOLO-Master#technical-report)
</div>

---

## 🌟 Overview

We are thrilled to announce **YOLO-Master v2026.02**, a milestone release that achieves major breakthroughs in model efficiency and architectural flexibility, redefining the paradigm for large-scale model training and inference.

### 🎯 Key Highlights

- **🧠 Mixture of Experts (MoE)**: Implements dynamic expert activation, significantly enhancing model capacity without proportional increase in computational cost
- **⚡ Low-Rank Adaptation (LoRA)**: Parameter-efficient fine-tuning that dramatically reduces training resource requirements while achieving 95%+ of full fine-tuning performance
- **🔍 Sparse SAHI**: Intelligent adaptive slicing inference, achieving 3-5x speedup for large image detection
- **🎯 Cluster-Weighted NMS**: Cluster-based weighted fusion with significantly improved localization accuracy

---

## 🚀 New Features

### 1️⃣ Mixture of Experts (MoE) Support

The MoE architecture enables efficient model scaling through conditional computation, dramatically increasing model capacity while maintaining inference speed. Our implementation includes complete training, inference, and optimization pipelines.

#### 🔧 Core Components

**📊 MoE Loss Function (`MoELoss`)**
- **Load Balancing Loss** 🎯: Ensures balanced expert load distribution, preventing expert collapse
- **Z-Loss** 📉: Suppresses large logit values, ensuring numerical stability
- Adaptive weight adjustment mechanism that dynamically balances main task loss with auxiliary losses

*Implementation*: `ultralytics/nn/modules/moe/loss.py`

**✂️ Intelligent Pruning (`MoEPruner`)**
- Validation set-based expert utilization analysis
- Automatic pruning of low-utilization experts (default threshold: 15%)
- Significantly reduces model parameters and inference latency
- Achieves 20-30% inference speedup while maintaining performance

*Implementation*: `ultralytics/nn/modules/moe/pruning.py`

**🏗️ Modular Architecture**
- Decoupled router, expert networks, and gating mechanisms
- Supports multiple routing strategies: Top-K, Soft Routing, Expert Choice
- Highly extensible modular design, easy integration of custom experts

---

### 2️⃣ LoRA Support - Parameter-Efficient Fine-Tuning Revolution

LoRA achieves parameter-efficient fine-tuning through low-rank matrix decomposition, reaching 95%+ of full fine-tuning performance while training only 1-5% of parameters.

#### 🎯 Core Innovation: Architecture-Agnostic LoRA Adaptation

**Zero-Overhead Integration Principle**

We demonstrate that **LoRA training can be achieved without adding any new modules to the original YOLO model architecture**. This is accomplished through:

1. **Dynamic Weight Interception**: LoRA adapters are applied at the parameter level rather than the module level
2. **Configuration-Driven Activation**: LoRA behavior is controlled entirely through hyperparameter settings
3. **Backward Compatibility**: Models retain their original architecture and can switch between LoRA and standard training modes without code modification

##### Traditional Approach vs. Our Approach

**❌ Traditional Approach (Requires Model Modification)**

```python
# Traditional approach: Inject LoRA modules into model
class ConvWithLoRA(nn.Module):
    def __init__(self, conv_layer, r, alpha):
        super().__init__()
        self.conv = conv_layer
        self.lora_A = nn.Parameter(...)  # NEW MODULE
        self.lora_B = nn.Parameter(...)  # NEW MODULE
        
    def forward(self, x):
        return self.conv(x) + self.lora_B @ self.lora_A @ x
```

**✅ Our Approach (Zero Architectural Overhead)**

```python
# Our approach: Configuration-only adaptation
# Original model architecture remains UNCHANGED
model = YOLO("yolov8n.pt")  # Standard model

# LoRA enabled through configuration
results = model.train(
    data="coco8.yaml",
    epochs=50,
    lora_r=16,              # LoRA activated via config
    lora_alpha=32,
    lora_gradient_checkpointing=True
)

# No model surgery required!
```

---

#### 📋 Supported Model Matrix with Zero-Overhead Integration

| Model Family    | Architecture Type            | LoRA Integration Method | Architectural Changes Required | Configuration Parameters                            |
| :-------------- | :--------------------------- | :---------------------- | :----------------------------- | :-------------------------------------------------- |
| **YOLOv3**      | Convolutional Neural Network | Configuration-only      | **None** ✅                     | `lora_r`, `lora_alpha`, `gradient_checkpointing`    |
| **YOLOv5**      | Convolutional Neural Network | Configuration-only      | **None** ✅                     | `lora_r`, `lora_alpha`, `gradient_checkpointing`    |
| **YOLOv6**      | Convolutional Neural Network | Configuration-only      | **None** ✅                     | `lora_r`, `lora_alpha`, `gradient_checkpointing`    |
| **YOLOv8**      | Convolutional Neural Network | Configuration-only      | **None** ✅                     | `lora_r`, `lora_alpha`, `gradient_checkpointing`    |
| **YOLOv9**      | Convolutional Neural Network | Configuration-only      | **None** ✅                     | `lora_r`, `lora_alpha`, `gradient_checkpointing`    |
| **YOLOv10**     | Convolutional Neural Network | Configuration-only      | **None** ✅                     | `lora_r`, `lora_alpha`, `gradient_checkpointing`    |
| **YOLO11**      | Convolutional Neural Network | Configuration-only      | **None** ✅                     | `lora_r`, `lora_alpha`, `gradient_checkpointing`    |
| **YOLO12**      | Hybrid (CNN+Attention)       | Configuration-only      | **None** ✅                     | `lora_r`, `lora_alpha`, `include_attention=True`    |
| **RT-DETR**     | Transformer-based            | Configuration-only      | **None** ✅                     | `lora_r`, `lora_alpha`, `include_attention=True`    |
| **YOLO-World**  | Multi-modal  | Configuration-only      | **None** ✅                     | `lora_r`, `lora_alpha`, `include_attention=True`    |
| **YOLO-Master** | Mixture of Experts (MoE)     | Configuration-only      | **None** ✅                     | `lora_r`, `lora_alpha`, `target_modules=["expert"]` |

---

#### ⚙️ Key LoRA Configuration Parameters

| Parameter                     | Description                         | Default Value | YOLO (Conv)            | RT-DETR (Transformer)  | YOLO-Master (MoE)              |
| :---------------------------- | :---------------------------------- | :------------ | :--------------------- | :--------------------- | :----------------------------- |
| `lora_r`                      | Rank of low-rank decomposition      | 16            | 16-32                  | 8-16                   | 32-64                          |
| `lora_alpha`                  | Scaling factor for LoRA updates     | 32            | 32-64                  | 16-32                  | 64-128                         |
| `lora_dropout`                | Dropout probability for LoRA layers | 0.1           | 0.1                    | 0.1                    | 0.05                           |
| `lora_gradient_checkpointing` | Enable gradient checkpointing       | `False`       | **`True`** (mandatory) | **`True`** (mandatory) | **`True`** (mandatory)         |
| `lora_include_attention`      | Apply LoRA to attention layers      | `False`       | `False`                | **`True`**             | `False`                        |
| `lora_target_modules`         | Regex pattern for target modules    | `["conv"]`    | `["conv"]`             | `["linear", "conv"]`   | `["conv", "expert", "router"]` |

*Implementation*: `ultralytics/utils/lora.py`

---

#### 📊 Experimental Validation: PEFT Methods Comparison on YOLOv11

To comprehensively validate the effectiveness of LoRA and its variants, we conducted systematic ablation studies based on the **YOLOv11** architecture. We compared the following four training strategies:

| Training Strategy | Description                            | Trainable Parameters Ratio | Typical Use Cases                                         |
| ----------------- | -------------------------------------- | -------------------------- | --------------------------------------------------------- |
| **Full SFT**      | Full Supervised Fine-Tuning (Baseline) | 100%                       | Resource-rich environments, pursuing ultimate performance |
| **LoRA (r=16)**   | Low-Rank Adaptation, rank=16           | ~10%                       | Resource-constrained, rapid adaptation                    |
| **DoRA (r=16)**   | Weight-Decomposed LoRA, rank=16        | ~12%                       | Requires stronger expressiveness                          |
| **LoHa (r=16)**   | Hadamard Product LoRA, rank=16         | ~11%                       | Balance performance and efficiency                        |

<div align="center">
  <img width="100%" alt="LoRA Training Comparison" src="https://github.com/user-attachments/assets/98c6cada-ddc7-4723-877d-59d16ee0fdb2" />
  <p><i><b>Ultimate YOLO Benchmark: PEFT Methods vs Full SFT</b></i></p>
</div>

<img width="3306" height="1186" alt="Image" src="https://github.com/user-attachments/assets/5c51a886-e81d-43a4-bf4d-d37991e35cd2" />

**🔬 Experimental Setup**
- **Base Model**: YOLOv11-s (pre-trained weights, model size 21.5MB)
- **Dataset**: COCO val2017 subset
- **Training Epochs**: 300 epochs
- **Rank Setting**: Uniformly set r=16 (LoRA and variants)
- **LoRA Adapter Size**: 4.1MB (~19% of base model)
- **Evaluation Metrics**: Box Loss, mAP@0.50, mAP@0.50:0.95

---

We benchmarked the GPU memory usage of YOLOv11 and YOLOv12 under a LoRA configuration ($rank=8$) to verify their memory performance during the fine-tuning process.

<img width="5372" height="2535" alt="Image" src="https://github.com/user-attachments/assets/0c2d2689-72c2-47fb-97c6-002fefa99c73" />

| **Model Version** | **Base Params (Full) (Million)** | **LoRA Params** | **Base Model Size (MB)** | **Adapter File Size (MB)** | **Param Ratio (%)** |
| ----------------- | -------------------------------- | --------------- | ------------------------ | -------------------------- | ------------------- |
| **YOLO11n**       | 2.6                              | 527,536         | 5.6                      | 2.1                        | 20.29%              |
| **YOLO11s**       | 9.4                              | 1,016,240       | 19.3                     | 4.1                        | 10.81%              |
| **YOLO11m**       | 20.1                             | 1,639,856       | 40.7                     | 6.6                        | 8.16%               |
| **YOLO11l**       | 25.3                             | 2,350,512       | 51.4                     | 9.4                        | 9.29%               |
| **YOLO11x**       | 56.9                             | 3,525,552       | 114.6                    | 14.1                       | 6.20%               |

| **Model Version** | **Base Params (Full) (Million)** | **LoRA Params** | **Base Model Size (MB)** | **Adapter File Size (MB)** | **Param Ratio (%)** |
| ----------------- | -------------------------------- | --------------- | ------------------------ | -------------------------- | ------------------- |
| **YOLO12n**       | 2.6                              | 632,752         | 5.6                      | 2.3                        | 24.34%              |
| **YOLO12s**       | 9.3                              | 1,077,680       | 19.0                     | 4.3                        | 11.59%              |
| **YOLO12m**       | 20.2                             | 1,684,912       | 40.9                     | 6.8                        | 8.34%               |
| **YOLO12l**       | 26.4                             | 2,442,160       | 53.7                     | 9.8                        | 9.25%               |
| **YOLO12x**       | 59.1                             | 3,662,768       | 119.3                    | 14.7                       | 6.20%               |


The empirical data reveals a clear inverse correlation between model scale and the LoRA training ratio, highlighting the superior scalability of low-rank adaptation for large-scale object detectors. For the YOLO11x model, the fine-tuning process requires updating only a small fraction (~6%) of the total parameter count, which significantly mitigates the VRAM overhead typically associated with full-parameter updates. Furthermore, the compact footprint of the resulting adapters (2MB–14MB) facilitates highly efficient model deployment and rapid task-switching in resource-constrained environments. This optimization ensures that even the most computationally intensive YOLO11 variants can be fine-tuned with minimal hardware requirements, achieving an optimal balance between architectural depth and training efficiency.

---

#### 🎯 Experimental Conclusions & Best Practices

##### ✅ **Core Findings**

1. **LoRA Effectiveness Fully Validated**  
   - Using ~10% trainable parameters (adapter file takes 19% storage)
   - Achieving 95-98% of full fine-tuning performance
   - 40-60% training speedup (70% reduction in memory usage)

2. **Performance Ranking of Different PEFT Methods**  
   ```
   Full SFT (100% params) > DoRA (r=16) ≈ LoRA (r=16) > LoHa (r=16)
   ```

3. **Performance-Efficiency Trade-off Recommendations**  
   | Scenario                                          | Recommended Method | Rationale                        |
   | ------------------------------------------------- | ------------------ | -------------------------------- |
   | Abundant resources, pursuing ultimate performance | Full SFT           | Performance ceiling              |
   | Rapid prototyping                                 | LoRA (r=8~16)      | Best cost-effectiveness          |
   | Need stronger expressiveness                      | DoRA (r=16~32)     | Weight decomposition enhancement |
   | Extremely constrained environment                 | LoRA (r=4~8)       | Minimal resource consumption     |

---

##### 📋 **Practical Deployment Recommendations**

**Rank (r) Selection Guide**:
```python
# Small models (YOLOv11-n/s)
lora_r = 8-16    # Sufficient to capture key changes (YOLOv11-s with r=16 works well)

# Medium models (YOLOv11-m/l)
lora_r = 16-32   # Balance performance and efficiency

# Large models (YOLOv11-x)
lora_r = 32-64   # Fully utilize model capacity
```

**Relationship Between Alpha and Rank**:
```python
# General rule of thumb
lora_alpha = 2 * lora_r  # e.g., r=16 → alpha=32

# Aggressive fine-tuning (large dataset difference)
lora_alpha = 4 * lora_r  # e.g., r=16 → alpha=64
```

---

#### 💾 Storage Efficiency Comparison (Measured Data)

| **Model Configuration** | **Full Model Size (MB)** | **LoRA Adapter Size (MB)** | **Compression Ratio** | **Status** |
| ----------------------- | ------------------------ | -------------------------- | --------------------- | ---------- |
| **YOLO11n**             | 5.6 MB                   | 2.1 MB                     | **2.67x**             | Measured   |
| **YOLO11s**             | 19.3 MB                  | 4.1 MB                     | **4.71x**             | Measured   |
| **YOLO11m**             | 40.7 MB                  | 6.6 MB                     | **6.17x**             | Measured   |
| **YOLO11l**             | 51.4 MB                  | 9.4 MB                     | **5.47x**             | Measured   |
| **YOLO11x**             | 114.6 MB                 | 14.1 MB                    | **8.13x**             | Measured   |

### Practical Significance (Based on YOLO11-x Measurements):

- 🚀 **Cloud Deployment**: Save approximately **87.7%** in storage and transmission costs by deploying a 14.1 MB adapter instead of a 114.6 MB full model.
- 📱 **Edge Devices**: High-performance models like YOLO11x can be deployed as one 114.6 MB base model with multiple 14.1 MB adapters for rapid multi-scenario switching.
- 🔄 **Version Control**: Managing 14.1 MB adapter versions via Git is significantly more efficient than tracking 114.6 MB binary full-model files.
- 💡 **Multi-Task Deployment Efficiency**: For 10 different tasks using YOLO11x, the LoRA approach requires only **255.6 MB** (1 × 114.6 MB base + 10 × 14.1 MB adapters), whereas the traditional method would require **1,146 MB**.

---

#### 🔧 Code Implementation: One-Click LoRA Activation

```python
from ultralytics import YOLO

# 1. Configuration validated on YOLOv11-s (experimental model)
model = YOLO("yolo11s.pt")  # 21.5MB base model

# 2. LoRA training (experimentally validated optimal configuration)
results = model.train(
    data="coco8.yaml",
    epochs=300,               # Consistent with experiments
    imgsz=640,
    batch=32,
    # LoRA core parameters (based on experimental conclusions)
    lora_r=16,                # rank=16 is the most cost-effective choice
    lora_alpha=32,            # alpha = 2×r
    lora_dropout=0.1,
    lora_gradient_checkpointing=True,  # Must enable
    # Optimizer settings
    optimizer="AdamW",
    lr0=0.0001,               # LoRA uses smaller learning rate
    warmup_epochs=10          # Sufficient warm-up
)

# 3. Performance evaluation
metrics = model.val()
print(f"mAP@0.50: {metrics.box.map50:.3f}")      # Expected ~0.86
print(f"mAP@0.50:0.95: {metrics.box.map:.3f}")   # Expected ~0.67

# 4. Save LoRA adapter
model.save_lora_only("yolo11s_lora_r16.pt")  # Only ~4.1MB (measured)
```

---

#### 📊 Comparison with Official Papers

| Metric                | Official LoRA Paper Claims | YOLO-Master Measured (YOLOv11) | Status                            |
| --------------------- | -------------------------- | ------------------------------ | --------------------------------- |
| Parameter Ratio       | 0.1-1% (Transformer)       | ~10% (Conv-based)              | ✅ As Expected                     |
| Performance Retention | 95%+                       | 95.7% (mAP@0.50:0.95)          | ✅ Achieved                        |
| Training Speedup      | 2-3x                       | 1.5-2x                         | ⚠️ Slightly Lower (Conv-intensive) |
| Memory Savings        | 70%+                       | 70-75%                         | ✅ As Expected                     |

**Note**: Due to the convolution-intensive nature of YOLO series, LoRA's trainable parameter ratio (~10%) is higher than Transformer models (0.1-1%), but training speedup is still significant. The adapter file size (4.1MB/21.5MB≈19%) differs from trainable parameter ratio due to storage format and precision considerations.

---

#### 🎓 Technical Insights: Why LoRA Excels on YOLO

1. **The Intrinsic Low-Rank Hypothesis**

   Empirical research suggests that the weight updates ($\Delta W$) during fine-tuning largely reside in a **low-dimensional subspace**. Since YOLO backbones are already robust feature extractors, LoRA efficiently captures these necessary adjustments without retraining the full parameter set.

2. **Implicit Regularization via Constraints**

   By explicitly limiting the rank of the trainable matrices, LoRA imposes a **structural constraint** on the optimization process. This acts as a powerful regularizer, preventing the model from overfitting to noise in smaller datasets and yielding smoother convergence curves.

3. **Modular Disentanglement**

   LoRA effectively decouples **general knowledge** (frozen backbone) from **task-specific skills** (adapter). This modularity ensures that the backbone's feature extraction capability remains intact, allowing for high transferability across similar domains (e.g., COCO to specific industrial defects).

---

#### 🚀 Future Optimization Directions

Based on experimental observations, we plan the following enhancements:

1. **Adaptive Rank Selection**  
   Dynamically allocate ranks based on layer importance (e.g., backbone uses r=8, neck/head uses r=16)

2. **Hybrid PEFT Strategies**  
   Combine LoRA (convolutional layers) + Adapter (attention layers) for finer parameter control

---

<div align="center">
  <img width="100%" alt="LoRA Mechanism" src="https://github.com/user-attachments/assets/98c6cada-ddc7-4723-877d-59d16ee0fdb2" />
  <p><i>LoRA Working Principle: Efficient parameter updates through low-rank matrices</i></p>
</div>

---

### 3️⃣ Sparse SAHI Mode

**Sparse Slicing Aided Hyper-Inference (Sparse SAHI)**

Revolutionary optimization for ultra-large image (4K/8K) detection scenarios, achieving 3-5x speedup by intelligently skipping blank regions.

#### 🧩 Working Mechanism

1. **🗺️ Objectness Mask Generation**  
   Low-resolution full-image inference generates object existence heatmap

2. **✂️ Adaptive Slicing**  
   Adaptive slicing based on heatmap, skipping regions with objectness < 0.15

3. **🎯 High-Resolution Inference**  
   High-resolution inference only on regions of interest

4. **🔗 Result Merging**  
   Merge multi-slice detection results using CW-NMS

*Implementation*: `_run_sparse_sahi_single` in `ultralytics/engine/predictor.py`

#### 📸 Visual Demonstrations

<div align="center">
  <img width="100%" alt="Sparse SAHI Pipeline" src="https://github.com/user-attachments/assets/f86a1f41-7538-4168-b4b4-112dafcd80d5" />
  <p><i></i></p>
</div>

<div align="center">
  <img width="100%" alt="Performance Comparison" src="https://github.com/user-attachments/assets/93d9252c-506a-4cf4-a0f1-ff864e0d721b" />
  <p><i></i></p>
</div>

<div align="center">
  <img width="100%" alt="Skip Ratio Analysis" src="https://github.com/user-attachments/assets/0aece4ee-f693-40bd-8164-2c7bcd954fd5" />
  <p><i></i></p>
</div>

<div align="center">
  <img width="100%" alt="Real-world Example" src="https://github.com/user-attachments/assets/7d41de53-7e58-472a-a6ad-15830b8744c6" />
  <p><i></i></p>
</div>

---

### 4️⃣ Cluster-Weighted NMS (CW-NMS)

**Cluster-Weighted Non-Maximum Suppression**

Cluster theory-based detection box fusion algorithm that significantly improves localization accuracy through weighted averaging instead of hard suppression.

#### 🔬 Algorithm Comparison

| Method              | Strategy                            | Pros                      | Cons                           |
| ------------------- | ----------------------------------- | ------------------------- | ------------------------------ |
| **Traditional NMS** | Direct discard of overlapping boxes | Fast                      | May lose accurate localization |
| **Soft-NMS**        | Confidence decay                    | Preserves more candidates | Parameter-sensitive            |
| **CW-NMS**          | Gaussian-weighted fusion            | High accuracy, robust     | Slight computational increase  |

**Mathematical Principle:**

```
weighted_box = Σ(box_i × w_i) / Σ(w_i)
where w_i = exp(-IoU²/2σ²) × conf_i
```

*Implementation*: `ultralytics/utils/nms.py`

---

## 🛠 Improvements & Fixes

### 🔧 Core Enhancements

| Category                    | Improvement                               | Impact                                                  |
| --------------------------- | ----------------------------------------- | ------------------------------------------------------- |
| **🔒 Robustness**            | `_robust_deepcopy` mechanism              | Resolves edge cases, 15% training stability improvement |
| **📚 Documentation**         | Automated documentation generation system | 100% code-documentation synchronization                 |
| **⚖️ License**               | Tencent Open Source License adoption      | Enterprise-friendly, community-friendly                 |
| **🧪 Experiment Management** | `experiments.yaml` configuration system   | Improved experiment reproducibility                     |
| **🚀 Entry Point**           | Standardized `app.py` entry               | Lowered usage barrier                                   |

### 🐛 Bug Fixes

- Fixed gradient accumulation issues in MoE mode
- Resolved VRAM overflow with large batch sizes
- Optimized multi-GPU training synchronization mechanism
- Fixed LoRA weight save/load edge cases

---

## 💡 Usage Examples

### 🧠 Example 1: MoE Training

<details>
<summary><b>🖱️ Click to expand full example</b></summary>

#### CLI Command Line Method
```bash
# Basic training
yolo detect train \
  model=ultralytics/cfg/models/master/v0_1/det/yolo-master-n.yaml \
  data=coco8.yaml \
  epochs=100 \
  imgsz=640

# Advanced configuration
yolo detect train \
  model=yolo-master-n.yaml \
  data=coco.yaml \
  epochs=300 \
  batch=32 \
  moe_num_experts=8 \
  moe_top_k=2 \
  moe_balance_loss_weight=0.01
```

#### Python API Method
```python
from ultralytics import YOLO

# Load MoE configuration
model = YOLO("ultralytics/cfg/models/master/v0_1/det/yolo-master-n.yaml")

# Training configuration
results = model.train(
    data="coco8.yaml",
    epochs=100,
    imgsz=640,
    batch=16,
    # MoE-specific parameters
    moe_num_experts=8,      # Number of experts
    moe_top_k=2,            # Experts activated per token
    moe_balance_loss=0.01,  # Load balancing loss weight
    # Training optimization
    optimizer="AdamW",
    lr0=0.001,
    warmup_epochs=3
)

# Evaluation
metrics = model.val()
print(f"mAP50-95: {metrics.box.map}")

# Expert utilization analysis
model.prune_experts(threshold=0.15)  # Prune low-utilization experts
```
</details>

---

### ⚡ Example 2: LoRA Fine-Tuning

<details>
<summary><b>🖱️ Click to expand full example</b></summary>

#### CLI Method
```bash
# Auto rank selection
yolo detect train \
  model=yolov8n.pt \
  data=custom_dataset.yaml \
  lora_auto_r_ratio=0.05 \
  lora_alpha=32 \
  epochs=50

# Manual configuration
yolo detect train \
  model=yolov8n.pt \
  data=custom_dataset.yaml \
  lora_r=16 \
  lora_alpha=32 \
  lora_dropout=0.1 \
  lora_target_modules="*.cv1.conv,*.cv2.conv,*.cv3.conv"
```

#### Python Advanced Usage
```python
from ultralytics import YOLO
from ultralytics.utils.lora import LoRAConfig

# Create LoRA configuration
lora_config = LoRAConfig(
    r=16,                    # Rank
    alpha=32,                # Scaling factor
    dropout=0.1,             # Dropout rate
    target_modules=[         # Target modules
        "*.cv1.conv",
        "*.cv2.conv", 
        "*.m.*.cv1.conv"
    ],
    auto_r_ratio=None        # Or use 0.05 for auto-calculation
)

# Load pre-trained model
model = YOLO("yolov8n.pt")

# LoRA fine-tuning
results = model.train(
    data="custom_dataset.yaml",
    epochs=50,
    lora_config=lora_config,
    batch=32,
    optimizer="AdamW",
    lr0=0.0001  # LoRA typically uses smaller learning rate
)

# Merge LoRA weights into main model
model.merge_lora_weights()
model.save("model_with_lora.pt")

# Save only LoRA adapter (ultra-compact file)
model.save_lora_only("lora_adapter.pt")  # Typically < 5MB
```
</details>

---

### 🔍 Example 3: Sparse SAHI Inference

<details>
<summary><b>🖱️ Click to expand full example</b></summary>

#### CLI Method
```bash
# Basic Sparse SAHI
yolo detect predict \
  model=yolov8n.pt \
  source=large_image_4k.jpg \
  sparse_sahi=True \
  slice_size=640 \
  overlap_ratio=0.2

# Batch processing
yolo detect predict \
  model=yolov8n.pt \
  source=satellite_images/*.jpg \
  sparse_sahi=True \
  slice_size=1024 \
  objectness_threshold=0.15 \
  save=True
```

#### Python Method
```python
from ultralytics import YOLO
import cv2

model = YOLO("yolov8n.pt")

# Single image inference
results = model.predict(
    source="large_aerial_image.jpg",
    sparse_sahi=True,
    slice_size=640,
    overlap_ratio=0.2,
    objectness_threshold=0.15,
    conf=0.25,
    iou=0.45
)

# Visualization
annotated = results[0].plot()
cv2.imwrite("result.jpg", annotated)

# Batch video processing
for result in model.predict(
    source="video.mp4",
    stream=True,
    sparse_sahi=True,
    slice_size=1280
):
    # Real-time processing
    boxes = result.boxes
    print(f"Frame: {result.frame}, Objects: {len(boxes)}")
```
</details>

---

### 🎯 Example 4: Cluster-Weighted NMS

<details>
<summary><b>🖱️ Click to expand full example</b></summary>

#### CLI Method
```bash
# Enable CW-NMS
yolo detect predict \
  model=yolov8n.pt \
  source=image.jpg \
  cluster=True \
  sigma=0.1 \
  conf=0.25

# Comparison with traditional NMS
yolo detect predict \
  model=yolov8n.pt \
  source=crowded_scene.jpg \
  cluster=False  # Traditional NMS
  
yolo detect predict \
  model=yolov8n.pt \
  source=crowded_scene.jpg \
  cluster=True   # CW-NMS
  sigma=0.05
```

#### Python Method
```python
from ultralytics import YOLO

model = YOLO("yolov8n.pt")

# CW-NMS inference
results = model.predict(
    source="dense_objects.jpg",
    cluster=True,          # Enable CW-NMS
    sigma=0.1,             # Gaussian weight standard deviation
    conf=0.25,
    iou=0.45,
    max_det=300            # Maximum detections
)

# Accuracy analysis
boxes = results[0].boxes
print(f"Detected: {len(boxes)} objects")
print(f"Average confidence: {boxes.conf.mean():.3f}")
```
</details>

---

## 📊 Model Zoo & Benchmarks

<div align="center">
  <table>
    <tr>
      <td align="center">
        <img width="100%" alt="Model Performance 1" src="https://github.com/user-attachments/assets/9bd46c20-f4e3-4680-ad59-fcbab4b870f5" />
      </td>
      <td align="center">
        <img width="100%" alt="Model Performance 2" src="https://github.com/user-attachments/assets/6f1b13c2-651f-4579-8a34-833c4753322a" />
      </td>
    </tr>
    <tr>
      <td align="center">
        <img width="100%" alt="Model Performance 3" src="https://github.com/user-attachments/assets/b6680e38-b206-438f-b693-4c7f858fb8b7" />
      </td>
      <td align="center">
        <img width="100%" alt="Model Performance 4" src="https://github.com/user-attachments/assets/9f17ac3e-f839-4950-8661-76a5d4714443" />
      </td>
    </tr>
  </table>
</div>

### 🏆 Official Models

### YOLO-Master-EsMoE Series
|                            Model                             | Config                                                       | Params(M) | GFLOPs(G) | Box(P) |   R   | mAP50 | mAP50-95 | Speed(4090) tensorRT |  FPS   |
| :----------------------------------------------------------: | ------------------------------------------------------------ | :-------: | :-------: | :----: | :---: | :---: | :------: | :------------------: | :----: |
| [**YOLO-Master-EsMoE-N**](https://huggingface.co/gatilin/YOLO-Master-ckpts-v0/resolve/main/YOLO-Master-EsMoE-N/YOLO-Master-EsMoE-N.pt?download=true) | [Config](https://github.com/Tencent/YOLO-Master/blob/main/ultralytics/cfg/models/master/v0/det/yolo-master-n.yaml) |   2.68    |    8.7    | 0.684  | 0.536 | 0.587 |  0.427   |         1.56         | 640.18 |
| [**YOLO-Master-EsMoE-S**](https://huggingface.co/gatilin/YOLO-Master-ckpts-v0/resolve/main/YOLO-Master-EsMoE-S/YOLO-Master-EsMoE-S.pt?download=true) | [Config](https://github.com/Tencent/YOLO-Master/blob/main/ultralytics/cfg/models/master/v0/det/yolo-master-s.yaml) |   9.69    |   29.1    | 0.699  | 0.603 | 0.603 |  0.489   |         2.36         | 423.87 |
| [**YOLO-Master-EsMoE-M**](https://huggingface.co/gatilin/YOLO-Master-ckpts-v0/resolve/main/YOLO-Master-EsMoE-M/YOLO-Master-EsMoE-M.pt?download=true) | [Config](https://github.com/Tencent/YOLO-Master/blob/main/ultralytics/cfg/models/master/v0/det/yolo-master-m.yaml) |   34.88   |   97.4    | 0.737  | 0.64  | 0.697 |  0.5301  |         4.1          | 243.79 |
|                   **YOLO-Master-EsMoE-L**                    | [Config](https://github.com/Tencent/YOLO-Master/blob/main/ultralytics/cfg/models/master/v0/det/yolo-master-l.yaml) | 🔥training |    TBD    |  TBD   |  TBD  |  TBD  |   TBD    |         TBD          |  TBD   |
|                   **YOLO-Master-EsMoE-X**                    | [Config](https://github.com/Tencent/YOLO-Master/blob/main/ultralytics/cfg/models/master/v0/det/yolo-master-x.yaml) | 🔥training |    TBD    |  TBD   |  TBD  |  TBD  |   TBD    |         TBD          |  TBD   |

### YOLO-Master-v0.1 Series

|                            Model                             | Config                                                       | Params(M) | GFLOPs(G) | Box(P) |   R   | mAP50 | mAP50-95 | Speed(4090) tensorRT |  FPS   |
| :----------------------------------------------------------: | ------------------------------------------------------------ | :-------: | :-------: | :----: | :---: | :---: | :------: | :------------------: | :----: |
| [**YOLO-Master-v0.1-N**](https://huggingface.co/gatilin/YOLO-Master-ckpts-v0_1/resolve/main/YOLO-Master-v0.1-N/YOLO-Master-v0.1-N.pt?download=true) | [Config](https://github.com/Tencent/YOLO-Master/blob/main/ultralytics/cfg/models/master/v0_1/det/yolo-master-n.yaml) |   7.54    |   10.1    | 0.684  | 0.542 | 0.592 |  0.429   |         1.81         | 528.84 |
| [**YOLO-Master-v0.1-S**](https://huggingface.co/gatilin/YOLO-Master-ckpts-v0_1/resolve/main/YOLO-Master-v0.1-S/YOLO-Master-v0.1-S.pt?download=true) | [Config](https://github.com/Tencent/YOLO-Master/blob/main/ultralytics/cfg/models/master/v0_1/det/yolo-master-s.yaml) |   29.15   |    36     | 0.724  | 0.607 | 0.662 |  0.489   |         2.9          | 345.24 |
| [**YOLO-Master-v0.1-M**](https://huggingface.co/gatilin/YOLO-Master-ckpts-v0_1/resolve/main/YOLO-Master-v0.1-M/YOLO-Master-v0.1-M.pt?download=true) | [Config](https://github.com/Tencent/YOLO-Master/blob/main/ultralytics/cfg/models/master/v0_1/det/yolo-master-m.yaml) |   52.17   |   116.7   | 0.729  | 0.641 | 0.696 |  0.528   |         5.28         | 170.72 |
| [**YOLO-Master-v0.1-L**](https://huggingface.co/gatilin/YOLO-Master-ckpts-v0_1/resolve/main/YOLO-Master-v0.1-L/YOLO-Master-v0.1-L.pt?download=true) | [Config](https://github.com/Tencent/YOLO-Master/blob/main/ultralytics/cfg/models/master/v0_1/det/yolo-master-l.yaml) |   58.41   |   138.1   | 0.739  | 0.646 | 0.705 |  0.539   |         6.67         | 149.86 |
|                    **YOLO-Master-v0.1-X**                    | [Config](https://github.com/Tencent/YOLO-Master/blob/main/ultralytics/cfg/models/master/v0_1/det/yolo-master-x.yaml) | 🔥training |    TBD    |  TBD   |  TBD  |  TBD  |   TBD    |         TBD          |  TBD   |

### 🤝 Community Contributions

We welcome and encourage community contributions! Please submit your trained models via Pull Request.

#### 📝 Contribution Guidelines

1. **Fork** this repository
2. **Train** your model with detailed logs
3. **Benchmark** on standard datasets (COCO/VOC/Custom)
4. **Submit** PR with:
   - Model weights (hosted on external storage)
   - Training configuration YAML
   - Benchmark results
   - Training logs and curves

---

## 🔄 Migration Guide

### From v2026.01 to v2026.02

<details>
<summary><b>🔧 Click to view detailed migration steps</b></summary>

#### 1️⃣ Configuration File Updates

**Old Version (v2026.01):**
```yaml
# model.yaml
model:
  backbone: CSPDarknet
  head: YOLOv8Head
```

**New Version (v2026.02):**
```yaml
# model.yaml
model:
  backbone: CSPDarknet
  head: YOLOv8Head
  
# MoE configuration (optional)
moe:
  num_experts: 8
  top_k: 2
  balance_loss_weight: 0.01

# LoRA configuration (optional)
lora:
  r: 16
  alpha: 32
  target_modules: ["*.cv1.conv", "*.cv2.conv"]
```

#### 2️⃣ API Changes

**Training API:**
```python
# Old version
model.train(data="coco.yaml", epochs=100)

# New version (backward compatible)
model.train(
    data="coco.yaml", 
    epochs=100,
    # New parameters
    lora_r=16,              # LoRA rank
    sparse_sahi=True,       # Sparse SAHI
    cluster_nms=True        # CW-NMS
)
```

#### 3️⃣ Weight File Compatibility

- ✅ v2026.01 weights can be directly used in v2026.02
- ✅ Automatic weight conversion supported
- ⚠️ MoE/LoRA weights require training with new version

</details>

---

## 🤝 Community

- [GitHub Discussions](https://github.com/Tencent/YOLO-Master/discussions)
- [Issues & Bug Reports](https://github.com/Tencent/YOLO-Master/issues)
- [Feature Requests](https://github.com/Tencent/YOLO-Master/issues/new?template=feature_request.md)

---

## 🙏 Acknowledgments

We would like to thank:
- 🌟 All contributors to this release
- 🧪 Beta testers for valuable feedback
- 📚 The research community for foundational work on MoE, LoRA, and SAHI
- 💪 Our users for continuous support and suggestions

---

## 📄 License

This project is licensed under the **Tencent Open Source License**. See [LICENSE](LICENSE) for details.

---

## 📞 Contact & Support

- **Issues**: [GitHub Issues](https://github.com/Tencent/YOLO-Master/issues)
- **Email**: gatilin@tencent.com / islinxu@163.com 

---

<div align="center">

### 🌟 Star History

[![Star History Chart](https://api.star-history.com/svg?repos=Tencent/YOLO-Master&type=Date)](https://star-history.com/#Tencent/YOLO-Master&Date)

---

**Made with ❤️ by the YOLO-Master Team**

*For detailed commit history and technical implementation, please refer to [CHANGELOG.md](CHANGELOG.md)*

</div>

Model Family	Architecture Type	LoRA Integration Method	Architectural Changes Required	Configuration Parameters
YOLOv3	Convolutional Neural Network	Configuration-only	None ✅	`lora_r`, `lora_alpha`, `gradient_checkpointing`
YOLOv5	Convolutional Neural Network	Configuration-only	None ✅	`lora_r`, `lora_alpha`, `gradient_checkpointing`
YOLOv6	Convolutional Neural Network	Configuration-only	None ✅	`lora_r`, `lora_alpha`, `gradient_checkpointing`
YOLOv8	Convolutional Neural Network	Configuration-only	None ✅	`lora_r`, `lora_alpha`, `gradient_checkpointing`
YOLOv9	Convolutional Neural Network	Configuration-only	None ✅	`lora_r`, `lora_alpha`, `gradient_checkpointing`
YOLOv10	Convolutional Neural Network	Configuration-only	None ✅	`lora_r`, `lora_alpha`, `gradient_checkpointing`
YOLO11	Convolutional Neural Network	Configuration-only	None ✅	`lora_r`, `lora_alpha`, `gradient_checkpointing`
YOLO12	Hybrid (CNN+Attention)	Configuration-only	None ✅	`lora_r`, `lora_alpha`, `include_attention=True`
RT-DETR	Transformer-based	Configuration-only	None ✅	`lora_r`, `lora_alpha`, `include_attention=True`
YOLO-World	Multi-modal	Configuration-only	None ✅	`lora_r`, `lora_alpha`, `include_attention=True`
YOLO-Master	Mixture of Experts (MoE)	Configuration-only	None ✅	`lora_r`, `lora_alpha`, `target_modules=["expert"]`

Parameter	Description	Default Value	YOLO (Conv)	RT-DETR (Transformer)	YOLO-Master (MoE)
`lora_r`	Rank of low-rank decomposition	16	16-32	8-16	32-64
`lora_alpha`	Scaling factor for LoRA updates	32	32-64	16-32	64-128
`lora_dropout`	Dropout probability for LoRA layers	0.1	0.1	0.1	0.05
`lora_gradient_checkpointing`	Enable gradient checkpointing	`False`	`True` (mandatory)	`True` (mandatory)	`True` (mandatory)
`lora_include_attention`	Apply LoRA to attention layers	`False`	`False`	`True`	`False`
`lora_target_modules`	Regex pattern for target modules	`["conv"]`	`["conv"]`	`["linear", "conv"]`	`["conv", "expert", "router"]`

Training Strategy	Description	Trainable Parameters Ratio	Typical Use Cases
Full SFT	Full Supervised Fine-Tuning (Baseline)	100%	Resource-rich environments, pursuing ultimate performance
LoRA (r=16)	Low-Rank Adaptation, rank=16	~10%	Resource-constrained, rapid adaptation
DoRA (r=16)	Weight-Decomposed LoRA, rank=16	~12%	Requires stronger expressiveness
LoHa (r=16)	Hadamard Product LoRA, rank=16	~11%	Balance performance and efficiency

Model Version	Base Params (Full) (Million)	LoRA Params	Base Model Size (MB)	Adapter File Size (MB)	Param Ratio (%)
YOLO11n	2.6	527,536	5.6	2.1	20.29%
YOLO11s	9.4	1,016,240	19.3	4.1	10.81%
YOLO11m	20.1	1,639,856	40.7	6.6	8.16%
YOLO11l	25.3	2,350,512	51.4	9.4	9.29%
YOLO11x	56.9	3,525,552	114.6	14.1	6.20%

Model Version	Base Params (Full) (Million)	LoRA Params	Base Model Size (MB)	Adapter File Size (MB)	Param Ratio (%)
YOLO12n	2.6	632,752	5.6	2.3	24.34%
YOLO12s	9.3	1,077,680	19.0	4.3	11.59%
YOLO12m	20.2	1,684,912	40.9	6.8	8.34%
YOLO12l	26.4	2,442,160	53.7	9.8	9.25%
YOLO12x	59.1	3,662,768	119.3	14.7	6.20%

Scenario	Recommended Method	Rationale
Abundant resources, pursuing ultimate performance	Full SFT	Performance ceiling
Rapid prototyping	LoRA (r=8~16)	Best cost-effectiveness
Need stronger expressiveness	DoRA (r=16~32)	Weight decomposition enhancement
Extremely constrained environment	LoRA (r=4~8)	Minimal resource consumption

Model Configuration	Full Model Size (MB)	LoRA Adapter Size (MB)	Compression Ratio	Status
YOLO11n	5.6 MB	2.1 MB	2.67x	Measured
YOLO11s	19.3 MB	4.1 MB	4.71x	Measured
YOLO11m	40.7 MB	6.6 MB	6.17x	Measured
YOLO11l	51.4 MB	9.4 MB	5.47x	Measured
YOLO11x	114.6 MB	14.1 MB	8.13x	Measured

Metric	Official LoRA Paper Claims	YOLO-Master Measured (YOLOv11)	Status
Parameter Ratio	0.1-1% (Transformer)	~10% (Conv-based)	✅ As Expected
Performance Retention	95%+	95.7% (mAP@0.50:0.95)	✅ Achieved
Training Speedup	2-3x	1.5-2x	⚠️ Slightly Lower (Conv-intensive)
Memory Savings	70%+	70-75%	✅ As Expected

Method	Strategy	Pros	Cons
Traditional NMS	Direct discard of overlapping boxes	Fast	May lose accurate localization
Soft-NMS	Confidence decay	Preserves more candidates	Parameter-sensitive
CW-NMS	Gaussian-weighted fusion	High accuracy, robust	Slight computational increase

Category	Improvement	Impact
🔒 Robustness	`_robust_deepcopy` mechanism	Resolves edge cases, 15% training stability improvement
📚 Documentation	Automated documentation generation system	100% code-documentation synchronization
⚖️ License	Tencent Open Source License adoption	Enterprise-friendly, community-friendly
🧪 Experiment Management	`experiments.yaml` configuration system	Improved experiment reproducibility
🚀 Entry Point	Standardized `app.py` entry	Lowered usage barrier

Model	Config	Params(M)	GFLOPs(G)	Box(P)	R	mAP50	mAP50-95	Speed(4090) tensorRT	FPS
YOLO-Master-EsMoE-N	Config	2.68	8.7	0.684	0.536	0.587	0.427	1.56	640.18
YOLO-Master-EsMoE-S	Config	9.69	29.1	0.699	0.603	0.603	0.489	2.36	423.87
YOLO-Master-EsMoE-M	Config	34.88	97.4	0.737	0.64	0.697	0.5301	4.1	243.79
YOLO-Master-EsMoE-L	Config	🔥training	TBD	TBD	TBD	TBD	TBD	TBD	TBD
YOLO-Master-EsMoE-X	Config	🔥training	TBD	TBD	TBD	TBD	TBD	TBD	TBD

Model	Config	Params(M)	GFLOPs(G)	Box(P)	R	mAP50	mAP50-95	Speed(4090) tensorRT	FPS
YOLO-Master-v0.1-N	Config	7.54	10.1	0.684	0.542	0.592	0.429	1.81	528.84
YOLO-Master-v0.1-S	Config	29.15	36	0.724	0.607	0.662	0.489	2.9	345.24
YOLO-Master-v0.1-M	Config	52.17	116.7	0.729	0.641	0.696	0.528	5.28	170.72
YOLO-Master-v0.1-L	Config	58.41	138.1	0.739	0.646	0.705	0.539	6.67	149.86
YOLO-Master-v0.1-X	Config	🔥training	TBD	TBD	TBD	TBD	TBD	TBD	TBD

【News】YOLO-Master-v26.02 Release #28

Description

🎯 YOLO-Master v2026.02 Release Notes

🌟 Overview

🎯 Key Highlights

🚀 New Features

1️⃣ Mixture of Experts (MoE) Support

🔧 Core Components

2️⃣ LoRA Support - Parameter-Efficient Fine-Tuning Revolution

🎯 Core Innovation: Architecture-Agnostic LoRA Adaptation

Traditional Approach vs. Our Approach

📋 Supported Model Matrix with Zero-Overhead Integration

⚙️ Key LoRA Configuration Parameters

📊 Experimental Validation: PEFT Methods Comparison on YOLOv11

🎯 Experimental Conclusions & Best Practices

✅ Core Findings

📋 Practical Deployment Recommendations

💾 Storage Efficiency Comparison (Measured Data)

Practical Significance (Based on YOLO11-x Measurements):

🔧 Code Implementation: One-Click LoRA Activation

📊 Comparison with Official Papers

🎓 Technical Insights: Why LoRA Excels on YOLO

🚀 Future Optimization Directions

3️⃣ Sparse SAHI Mode

🧩 Working Mechanism

📸 Visual Demonstrations

4️⃣ Cluster-Weighted NMS (CW-NMS)

🔬 Algorithm Comparison

🛠 Improvements & Fixes

🔧 Core Enhancements

🐛 Bug Fixes

💡 Usage Examples

🧠 Example 1: MoE Training

CLI Command Line Method

Python API Method

⚡ Example 2: LoRA Fine-Tuning

CLI Method

Python Advanced Usage

🔍 Example 3: Sparse SAHI Inference

CLI Method

Python Method

🎯 Example 4: Cluster-Weighted NMS

CLI Method

Python Method

📊 Model Zoo & Benchmarks

🏆 Official Models

YOLO-Master-EsMoE Series

YOLO-Master-v0.1 Series

🤝 Community Contributions

📝 Contribution Guidelines

🔄 Migration Guide

From v2026.01 to v2026.02

1️⃣ Configuration File Updates

2️⃣ API Changes

3️⃣ Weight File Compatibility

🤝 Community

🙏 Acknowledgments

📄 License

📞 Contact & Support

🌟 Star History

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions