Skip to content

【News】YOLO-Master-v26.02 Release #28

@isLinXu

Description

@isLinXu
YOLO-Master Logo

🎯 YOLO-Master v2026.02 Release Notes

LicensePythonPyTorchGitHub starsGitHub forksTechnical Report


🌟 Overview

We are thrilled to announce YOLO-Master v2026.02, a milestone release that achieves major breakthroughs in model efficiency and architectural flexibility, redefining the paradigm for large-scale model training and inference.

🎯 Key Highlights

  • 🧠 Mixture of Experts (MoE): Implements dynamic expert activation, significantly enhancing model capacity without proportional increase in computational cost
  • ⚡ Low-Rank Adaptation (LoRA): Parameter-efficient fine-tuning that dramatically reduces training resource requirements while achieving 95%+ of full fine-tuning performance
  • 🔍 Sparse SAHI: Intelligent adaptive slicing inference, achieving 3-5x speedup for large image detection
  • 🎯 Cluster-Weighted NMS: Cluster-based weighted fusion with significantly improved localization accuracy

🚀 New Features

1️⃣ Mixture of Experts (MoE) Support

The MoE architecture enables efficient model scaling through conditional computation, dramatically increasing model capacity while maintaining inference speed. Our implementation includes complete training, inference, and optimization pipelines.

🔧 Core Components

📊 MoE Loss Function (MoELoss)

  • Load Balancing Loss 🎯: Ensures balanced expert load distribution, preventing expert collapse
  • Z-Loss 📉: Suppresses large logit values, ensuring numerical stability
  • Adaptive weight adjustment mechanism that dynamically balances main task loss with auxiliary losses

Implementation: ultralytics/nn/modules/moe/loss.py

✂️ Intelligent Pruning (MoEPruner)

  • Validation set-based expert utilization analysis
  • Automatic pruning of low-utilization experts (default threshold: 15%)
  • Significantly reduces model parameters and inference latency
  • Achieves 20-30% inference speedup while maintaining performance

Implementation: ultralytics/nn/modules/moe/pruning.py

🏗️ Modular Architecture

  • Decoupled router, expert networks, and gating mechanisms
  • Supports multiple routing strategies: Top-K, Soft Routing, Expert Choice
  • Highly extensible modular design, easy integration of custom experts

2️⃣ LoRA Support - Parameter-Efficient Fine-Tuning Revolution

LoRA achieves parameter-efficient fine-tuning through low-rank matrix decomposition, reaching 95%+ of full fine-tuning performance while training only 1-5% of parameters.

🎯 Core Innovation: Architecture-Agnostic LoRA Adaptation

Zero-Overhead Integration Principle

We demonstrate that LoRA training can be achieved without adding any new modules to the original YOLO model architecture. This is accomplished through:

  1. Dynamic Weight Interception: LoRA adapters are applied at the parameter level rather than the module level
  2. Configuration-Driven Activation: LoRA behavior is controlled entirely through hyperparameter settings
  3. Backward Compatibility: Models retain their original architecture and can switch between LoRA and standard training modes without code modification
Traditional Approach vs. Our Approach

❌ Traditional Approach (Requires Model Modification)

# Traditional approach: Inject LoRA modules into model
class ConvWithLoRA(nn.Module):
    def __init__(self, conv_layer, r, alpha):
        super().__init__()
        self.conv = conv_layer
        self.lora_A = nn.Parameter(...)  # NEW MODULE
        self.lora_B = nn.Parameter(...)  # NEW MODULE
        
    def forward(self, x):
        return self.conv(x) + self.lora_B @ self.lora_A @ x

✅ Our Approach (Zero Architectural Overhead)

# Our approach: Configuration-only adaptation
# Original model architecture remains UNCHANGED
model = YOLO("yolov8n.pt")  # Standard model

# LoRA enabled through configuration
results = model.train(
    data="coco8.yaml",
    epochs=50,
    lora_r=16,              # LoRA activated via config
    lora_alpha=32,
    lora_gradient_checkpointing=True
)

# No model surgery required!

📋 Supported Model Matrix with Zero-Overhead Integration

Model Family Architecture Type LoRA Integration Method Architectural Changes Required Configuration Parameters
YOLOv3 Convolutional Neural Network Configuration-only None lora_r, lora_alpha, gradient_checkpointing
YOLOv5 Convolutional Neural Network Configuration-only None lora_r, lora_alpha, gradient_checkpointing
YOLOv6 Convolutional Neural Network Configuration-only None lora_r, lora_alpha, gradient_checkpointing
YOLOv8 Convolutional Neural Network Configuration-only None lora_r, lora_alpha, gradient_checkpointing
YOLOv9 Convolutional Neural Network Configuration-only None lora_r, lora_alpha, gradient_checkpointing
YOLOv10 Convolutional Neural Network Configuration-only None lora_r, lora_alpha, gradient_checkpointing
YOLO11 Convolutional Neural Network Configuration-only None lora_r, lora_alpha, gradient_checkpointing
YOLO12 Hybrid (CNN+Attention) Configuration-only None lora_r, lora_alpha, include_attention=True
RT-DETR Transformer-based Configuration-only None lora_r, lora_alpha, include_attention=True
YOLO-World Multi-modal Configuration-only None lora_r, lora_alpha, include_attention=True
YOLO-Master Mixture of Experts (MoE) Configuration-only None lora_r, lora_alpha, target_modules=["expert"]

⚙️ Key LoRA Configuration Parameters

Parameter Description Default Value YOLO (Conv) RT-DETR (Transformer) YOLO-Master (MoE)
lora_r Rank of low-rank decomposition 16 16-32 8-16 32-64
lora_alpha Scaling factor for LoRA updates 32 32-64 16-32 64-128
lora_dropout Dropout probability for LoRA layers 0.1 0.1 0.1 0.05
lora_gradient_checkpointing Enable gradient checkpointing False True (mandatory) True (mandatory) True (mandatory)
lora_include_attention Apply LoRA to attention layers False False True False
lora_target_modules Regex pattern for target modules ["conv"] ["conv"] ["linear", "conv"] ["conv", "expert", "router"]

Implementation: ultralytics/utils/lora.py


📊 Experimental Validation: PEFT Methods Comparison on YOLOv11

To comprehensively validate the effectiveness of LoRA and its variants, we conducted systematic ablation studies based on the YOLOv11 architecture. We compared the following four training strategies:

Training Strategy Description Trainable Parameters Ratio Typical Use Cases
Full SFT Full Supervised Fine-Tuning (Baseline) 100% Resource-rich environments, pursuing ultimate performance
LoRA (r=16) Low-Rank Adaptation, rank=16 ~10% Resource-constrained, rapid adaptation
DoRA (r=16) Weight-Decomposed LoRA, rank=16 ~12% Requires stronger expressiveness
LoHa (r=16) Hadamard Product LoRA, rank=16 ~11% Balance performance and efficiency
LoRA Training Comparison

Ultimate YOLO Benchmark: PEFT Methods vs Full SFT

Image

🔬 Experimental Setup

  • Base Model: YOLOv11-s (pre-trained weights, model size 21.5MB)
  • Dataset: COCO val2017 subset
  • Training Epochs: 300 epochs
  • Rank Setting: Uniformly set r=16 (LoRA and variants)
  • LoRA Adapter Size: 4.1MB (~19% of base model)
  • Evaluation Metrics: Box Loss, mAP@0.50, mAP@0.50:0.95

We benchmarked the GPU memory usage of YOLOv11 and YOLOv12 under a LoRA configuration ($rank=8$) to verify their memory performance during the fine-tuning process.

Image
Model Version Base Params (Full) (Million) LoRA Params Base Model Size (MB) Adapter File Size (MB) Param Ratio (%)
YOLO11n 2.6 527,536 5.6 2.1 20.29%
YOLO11s 9.4 1,016,240 19.3 4.1 10.81%
YOLO11m 20.1 1,639,856 40.7 6.6 8.16%
YOLO11l 25.3 2,350,512 51.4 9.4 9.29%
YOLO11x 56.9 3,525,552 114.6 14.1 6.20%
Model Version Base Params (Full) (Million) LoRA Params Base Model Size (MB) Adapter File Size (MB) Param Ratio (%)
YOLO12n 2.6 632,752 5.6 2.3 24.34%
YOLO12s 9.3 1,077,680 19.0 4.3 11.59%
YOLO12m 20.2 1,684,912 40.9 6.8 8.34%
YOLO12l 26.4 2,442,160 53.7 9.8 9.25%
YOLO12x 59.1 3,662,768 119.3 14.7 6.20%

The empirical data reveals a clear inverse correlation between model scale and the LoRA training ratio, highlighting the superior scalability of low-rank adaptation for large-scale object detectors. For the YOLO11x model, the fine-tuning process requires updating only a small fraction (~6%) of the total parameter count, which significantly mitigates the VRAM overhead typically associated with full-parameter updates. Furthermore, the compact footprint of the resulting adapters (2MB–14MB) facilitates highly efficient model deployment and rapid task-switching in resource-constrained environments. This optimization ensures that even the most computationally intensive YOLO11 variants can be fine-tuned with minimal hardware requirements, achieving an optimal balance between architectural depth and training efficiency.


🎯 Experimental Conclusions & Best Practices

Core Findings
  1. LoRA Effectiveness Fully Validated

    • Using ~10% trainable parameters (adapter file takes 19% storage)
    • Achieving 95-98% of full fine-tuning performance
    • 40-60% training speedup (70% reduction in memory usage)
  2. Performance Ranking of Different PEFT Methods

    Full SFT (100% params) > DoRA (r=16) ≈ LoRA (r=16) > LoHa (r=16)
    
  3. Performance-Efficiency Trade-off Recommendations

    Scenario Recommended Method Rationale
    Abundant resources, pursuing ultimate performance Full SFT Performance ceiling
    Rapid prototyping LoRA (r=8~16) Best cost-effectiveness
    Need stronger expressiveness DoRA (r=16~32) Weight decomposition enhancement
    Extremely constrained environment LoRA (r=4~8) Minimal resource consumption

📋 Practical Deployment Recommendations

Rank (r) Selection Guide:

# Small models (YOLOv11-n/s)
lora_r = 8-16    # Sufficient to capture key changes (YOLOv11-s with r=16 works well)

# Medium models (YOLOv11-m/l)
lora_r = 16-32   # Balance performance and efficiency

# Large models (YOLOv11-x)
lora_r = 32-64   # Fully utilize model capacity

Relationship Between Alpha and Rank:

# General rule of thumb
lora_alpha = 2 * lora_r  # e.g., r=16 → alpha=32

# Aggressive fine-tuning (large dataset difference)
lora_alpha = 4 * lora_r  # e.g., r=16 → alpha=64

💾 Storage Efficiency Comparison (Measured Data)

Model Configuration Full Model Size (MB) LoRA Adapter Size (MB) Compression Ratio Status
YOLO11n 5.6 MB 2.1 MB 2.67x Measured
YOLO11s 19.3 MB 4.1 MB 4.71x Measured
YOLO11m 40.7 MB 6.6 MB 6.17x Measured
YOLO11l 51.4 MB 9.4 MB 5.47x Measured
YOLO11x 114.6 MB 14.1 MB 8.13x Measured

Practical Significance (Based on YOLO11-x Measurements):

  • 🚀 Cloud Deployment: Save approximately 87.7% in storage and transmission costs by deploying a 14.1 MB adapter instead of a 114.6 MB full model.
  • 📱 Edge Devices: High-performance models like YOLO11x can be deployed as one 114.6 MB base model with multiple 14.1 MB adapters for rapid multi-scenario switching.
  • 🔄 Version Control: Managing 14.1 MB adapter versions via Git is significantly more efficient than tracking 114.6 MB binary full-model files.
  • 💡 Multi-Task Deployment Efficiency: For 10 different tasks using YOLO11x, the LoRA approach requires only 255.6 MB (1 × 114.6 MB base + 10 × 14.1 MB adapters), whereas the traditional method would require 1,146 MB.

🔧 Code Implementation: One-Click LoRA Activation

from ultralytics import YOLO

# 1. Configuration validated on YOLOv11-s (experimental model)
model = YOLO("yolo11s.pt")  # 21.5MB base model

# 2. LoRA training (experimentally validated optimal configuration)
results = model.train(
    data="coco8.yaml",
    epochs=300,               # Consistent with experiments
    imgsz=640,
    batch=32,
    # LoRA core parameters (based on experimental conclusions)
    lora_r=16,                # rank=16 is the most cost-effective choice
    lora_alpha=32,            # alpha = 2×r
    lora_dropout=0.1,
    lora_gradient_checkpointing=True,  # Must enable
    # Optimizer settings
    optimizer="AdamW",
    lr0=0.0001,               # LoRA uses smaller learning rate
    warmup_epochs=10          # Sufficient warm-up
)

# 3. Performance evaluation
metrics = model.val()
print(f"mAP@0.50: {metrics.box.map50:.3f}")      # Expected ~0.86
print(f"mAP@0.50:0.95: {metrics.box.map:.3f}")   # Expected ~0.67

# 4. Save LoRA adapter
model.save_lora_only("yolo11s_lora_r16.pt")  # Only ~4.1MB (measured)

📊 Comparison with Official Papers

Metric Official LoRA Paper Claims YOLO-Master Measured (YOLOv11) Status
Parameter Ratio 0.1-1% (Transformer) ~10% (Conv-based) ✅ As Expected
Performance Retention 95%+ 95.7% (mAP@0.50:0.95) ✅ Achieved
Training Speedup 2-3x 1.5-2x ⚠️ Slightly Lower (Conv-intensive)
Memory Savings 70%+ 70-75% ✅ As Expected

Note: Due to the convolution-intensive nature of YOLO series, LoRA's trainable parameter ratio (~10%) is higher than Transformer models (0.1-1%), but training speedup is still significant. The adapter file size (4.1MB/21.5MB≈19%) differs from trainable parameter ratio due to storage format and precision considerations.


🎓 Technical Insights: Why LoRA Excels on YOLO

  1. The Intrinsic Low-Rank Hypothesis

    Empirical research suggests that the weight updates ($\Delta W$) during fine-tuning largely reside in a low-dimensional subspace. Since YOLO backbones are already robust feature extractors, LoRA efficiently captures these necessary adjustments without retraining the full parameter set.

  2. Implicit Regularization via Constraints

    By explicitly limiting the rank of the trainable matrices, LoRA imposes a structural constraint on the optimization process. This acts as a powerful regularizer, preventing the model from overfitting to noise in smaller datasets and yielding smoother convergence curves.

  3. Modular Disentanglement

    LoRA effectively decouples general knowledge (frozen backbone) from task-specific skills (adapter). This modularity ensures that the backbone's feature extraction capability remains intact, allowing for high transferability across similar domains (e.g., COCO to specific industrial defects).


🚀 Future Optimization Directions

Based on experimental observations, we plan the following enhancements:

  1. Adaptive Rank Selection
    Dynamically allocate ranks based on layer importance (e.g., backbone uses r=8, neck/head uses r=16)

  2. Hybrid PEFT Strategies
    Combine LoRA (convolutional layers) + Adapter (attention layers) for finer parameter control


LoRA Mechanism

LoRA Working Principle: Efficient parameter updates through low-rank matrices


3️⃣ Sparse SAHI Mode

Sparse Slicing Aided Hyper-Inference (Sparse SAHI)

Revolutionary optimization for ultra-large image (4K/8K) detection scenarios, achieving 3-5x speedup by intelligently skipping blank regions.

🧩 Working Mechanism

  1. 🗺️ Objectness Mask Generation
    Low-resolution full-image inference generates object existence heatmap

  2. ✂️ Adaptive Slicing
    Adaptive slicing based on heatmap, skipping regions with objectness < 0.15

  3. 🎯 High-Resolution Inference
    High-resolution inference only on regions of interest

  4. 🔗 Result Merging
    Merge multi-slice detection results using CW-NMS

Implementation: _run_sparse_sahi_single in ultralytics/engine/predictor.py

📸 Visual Demonstrations

Sparse SAHI Pipeline

Performance Comparison

Skip Ratio Analysis

Real-world Example


4️⃣ Cluster-Weighted NMS (CW-NMS)

Cluster-Weighted Non-Maximum Suppression

Cluster theory-based detection box fusion algorithm that significantly improves localization accuracy through weighted averaging instead of hard suppression.

🔬 Algorithm Comparison

Method Strategy Pros Cons
Traditional NMS Direct discard of overlapping boxes Fast May lose accurate localization
Soft-NMS Confidence decay Preserves more candidates Parameter-sensitive
CW-NMS Gaussian-weighted fusion High accuracy, robust Slight computational increase

Mathematical Principle:

weighted_box = Σ(box_i × w_i) / Σ(w_i)
where w_i = exp(-IoU²/2σ²) × conf_i

Implementation: ultralytics/utils/nms.py


🛠 Improvements & Fixes

🔧 Core Enhancements

Category Improvement Impact
🔒 Robustness _robust_deepcopy mechanism Resolves edge cases, 15% training stability improvement
📚 Documentation Automated documentation generation system 100% code-documentation synchronization
⚖️ License Tencent Open Source License adoption Enterprise-friendly, community-friendly
🧪 Experiment Management experiments.yaml configuration system Improved experiment reproducibility
🚀 Entry Point Standardized app.py entry Lowered usage barrier

🐛 Bug Fixes

  • Fixed gradient accumulation issues in MoE mode
  • Resolved VRAM overflow with large batch sizes
  • Optimized multi-GPU training synchronization mechanism
  • Fixed LoRA weight save/load edge cases

💡 Usage Examples

🧠 Example 1: MoE Training

🖱️ Click to expand full example

CLI Command Line Method

# Basic training
yolo detect train \
  model=ultralytics/cfg/models/master/v0_1/det/yolo-master-n.yaml \
  data=coco8.yaml \
  epochs=100 \
  imgsz=640

# Advanced configuration
yolo detect train \
  model=yolo-master-n.yaml \
  data=coco.yaml \
  epochs=300 \
  batch=32 \
  moe_num_experts=8 \
  moe_top_k=2 \
  moe_balance_loss_weight=0.01

Python API Method

from ultralytics import YOLO

# Load MoE configuration
model = YOLO("ultralytics/cfg/models/master/v0_1/det/yolo-master-n.yaml")

# Training configuration
results = model.train(
    data="coco8.yaml",
    epochs=100,
    imgsz=640,
    batch=16,
    # MoE-specific parameters
    moe_num_experts=8,      # Number of experts
    moe_top_k=2,            # Experts activated per token
    moe_balance_loss=0.01,  # Load balancing loss weight
    # Training optimization
    optimizer="AdamW",
    lr0=0.001,
    warmup_epochs=3
)

# Evaluation
metrics = model.val()
print(f"mAP50-95: {metrics.box.map}")

# Expert utilization analysis
model.prune_experts(threshold=0.15)  # Prune low-utilization experts

⚡ Example 2: LoRA Fine-Tuning

🖱️ Click to expand full example

CLI Method

# Auto rank selection
yolo detect train \
  model=yolov8n.pt \
  data=custom_dataset.yaml \
  lora_auto_r_ratio=0.05 \
  lora_alpha=32 \
  epochs=50

# Manual configuration
yolo detect train \
  model=yolov8n.pt \
  data=custom_dataset.yaml \
  lora_r=16 \
  lora_alpha=32 \
  lora_dropout=0.1 \
  lora_target_modules="*.cv1.conv,*.cv2.conv,*.cv3.conv"

Python Advanced Usage

from ultralytics import YOLO
from ultralytics.utils.lora import LoRAConfig

# Create LoRA configuration
lora_config = LoRAConfig(
    r=16,                    # Rank
    alpha=32,                # Scaling factor
    dropout=0.1,             # Dropout rate
    target_modules=[         # Target modules
        "*.cv1.conv",
        "*.cv2.conv", 
        "*.m.*.cv1.conv"
    ],
    auto_r_ratio=None        # Or use 0.05 for auto-calculation
)

# Load pre-trained model
model = YOLO("yolov8n.pt")

# LoRA fine-tuning
results = model.train(
    data="custom_dataset.yaml",
    epochs=50,
    lora_config=lora_config,
    batch=32,
    optimizer="AdamW",
    lr0=0.0001  # LoRA typically uses smaller learning rate
)

# Merge LoRA weights into main model
model.merge_lora_weights()
model.save("model_with_lora.pt")

# Save only LoRA adapter (ultra-compact file)
model.save_lora_only("lora_adapter.pt")  # Typically < 5MB

🔍 Example 3: Sparse SAHI Inference

🖱️ Click to expand full example

CLI Method

# Basic Sparse SAHI
yolo detect predict \
  model=yolov8n.pt \
  source=large_image_4k.jpg \
  sparse_sahi=True \
  slice_size=640 \
  overlap_ratio=0.2

# Batch processing
yolo detect predict \
  model=yolov8n.pt \
  source=satellite_images/*.jpg \
  sparse_sahi=True \
  slice_size=1024 \
  objectness_threshold=0.15 \
  save=True

Python Method

from ultralytics import YOLO
import cv2

model = YOLO("yolov8n.pt")

# Single image inference
results = model.predict(
    source="large_aerial_image.jpg",
    sparse_sahi=True,
    slice_size=640,
    overlap_ratio=0.2,
    objectness_threshold=0.15,
    conf=0.25,
    iou=0.45
)

# Visualization
annotated = results[0].plot()
cv2.imwrite("result.jpg", annotated)

# Batch video processing
for result in model.predict(
    source="video.mp4",
    stream=True,
    sparse_sahi=True,
    slice_size=1280
):
    # Real-time processing
    boxes = result.boxes
    print(f"Frame: {result.frame}, Objects: {len(boxes)}")

🎯 Example 4: Cluster-Weighted NMS

🖱️ Click to expand full example

CLI Method

# Enable CW-NMS
yolo detect predict \
  model=yolov8n.pt \
  source=image.jpg \
  cluster=True \
  sigma=0.1 \
  conf=0.25

# Comparison with traditional NMS
yolo detect predict \
  model=yolov8n.pt \
  source=crowded_scene.jpg \
  cluster=False  # Traditional NMS
  
yolo detect predict \
  model=yolov8n.pt \
  source=crowded_scene.jpg \
  cluster=True   # CW-NMS
  sigma=0.05

Python Method

from ultralytics import YOLO

model = YOLO("yolov8n.pt")

# CW-NMS inference
results = model.predict(
    source="dense_objects.jpg",
    cluster=True,          # Enable CW-NMS
    sigma=0.1,             # Gaussian weight standard deviation
    conf=0.25,
    iou=0.45,
    max_det=300            # Maximum detections
)

# Accuracy analysis
boxes = results[0].boxes
print(f"Detected: {len(boxes)} objects")
print(f"Average confidence: {boxes.conf.mean():.3f}")

📊 Model Zoo & Benchmarks

Model Performance 1 Model Performance 2
Model Performance 3 Model Performance 4

🏆 Official Models

YOLO-Master-EsMoE Series

Model Config Params(M) GFLOPs(G) Box(P) R mAP50 mAP50-95 Speed(4090) tensorRT FPS
YOLO-Master-EsMoE-N Config 2.68 8.7 0.684 0.536 0.587 0.427 1.56 640.18
YOLO-Master-EsMoE-S Config 9.69 29.1 0.699 0.603 0.603 0.489 2.36 423.87
YOLO-Master-EsMoE-M Config 34.88 97.4 0.737 0.64 0.697 0.5301 4.1 243.79
YOLO-Master-EsMoE-L Config 🔥training TBD TBD TBD TBD TBD TBD TBD
YOLO-Master-EsMoE-X Config 🔥training TBD TBD TBD TBD TBD TBD TBD

YOLO-Master-v0.1 Series

Model Config Params(M) GFLOPs(G) Box(P) R mAP50 mAP50-95 Speed(4090) tensorRT FPS
YOLO-Master-v0.1-N Config 7.54 10.1 0.684 0.542 0.592 0.429 1.81 528.84
YOLO-Master-v0.1-S Config 29.15 36 0.724 0.607 0.662 0.489 2.9 345.24
YOLO-Master-v0.1-M Config 52.17 116.7 0.729 0.641 0.696 0.528 5.28 170.72
YOLO-Master-v0.1-L Config 58.41 138.1 0.739 0.646 0.705 0.539 6.67 149.86
YOLO-Master-v0.1-X Config 🔥training TBD TBD TBD TBD TBD TBD TBD

🤝 Community Contributions

We welcome and encourage community contributions! Please submit your trained models via Pull Request.

📝 Contribution Guidelines

  1. Fork this repository
  2. Train your model with detailed logs
  3. Benchmark on standard datasets (COCO/VOC/Custom)
  4. Submit PR with:
    • Model weights (hosted on external storage)
    • Training configuration YAML
    • Benchmark results
    • Training logs and curves

🔄 Migration Guide

From v2026.01 to v2026.02

🔧 Click to view detailed migration steps

1️⃣ Configuration File Updates

Old Version (v2026.01):

# model.yaml
model:
  backbone: CSPDarknet
  head: YOLOv8Head

New Version (v2026.02):

# model.yaml
model:
  backbone: CSPDarknet
  head: YOLOv8Head
  
# MoE configuration (optional)
moe:
  num_experts: 8
  top_k: 2
  balance_loss_weight: 0.01

# LoRA configuration (optional)
lora:
  r: 16
  alpha: 32
  target_modules: ["*.cv1.conv", "*.cv2.conv"]

2️⃣ API Changes

Training API:

# Old version
model.train(data="coco.yaml", epochs=100)

# New version (backward compatible)
model.train(
    data="coco.yaml", 
    epochs=100,
    # New parameters
    lora_r=16,              # LoRA rank
    sparse_sahi=True,       # Sparse SAHI
    cluster_nms=True        # CW-NMS
)

3️⃣ Weight File Compatibility

  • ✅ v2026.01 weights can be directly used in v2026.02
  • ✅ Automatic weight conversion supported
  • ⚠️ MoE/LoRA weights require training with new version

🤝 Community


🙏 Acknowledgments

We would like to thank:

  • 🌟 All contributors to this release
  • 🧪 Beta testers for valuable feedback
  • 📚 The research community for foundational work on MoE, LoRA, and SAHI
  • 💪 Our users for continuous support and suggestions

📄 License

This project is licensed under the Tencent Open Source License. See LICENSE for details.


📞 Contact & Support


🌟 Star History

Star History Chart


Made with ❤️ by the YOLO-Master Team

For detailed commit history and technical implementation, please refer to CHANGELOG.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions