Skip to content

Conversation

@michaelmohamed
Copy link

@michaelmohamed michaelmohamed commented Dec 30, 2025

Description

For Technical Implementation Details: See docs.

This PR adds optional keypoint/pose estimation support to RF-DETR, following YOLOv11's approach for pose estimation. The implementation outputs (x, y, visibility) triplets per keypoint per detection, with full support for COCO-style 17-keypoint annotations.

Key features:

  • Fully configurable number of keypoints, names, and skeleton connections
  • Each keypoint outputs (x, y, visibility) like YOLOv11
  • Supports COCO-style keypoint annotations for training
  • Optional/configurable (like existing segmentation head) - no impact on existing detection/segmentation workflows
  • Uses detection weights (rf-detr-medium.pth) as default starting point for training

Related issues:

Addresses community requests for pose estimation support in RF-DETR.

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

29 unit tests covering both training and inference pipelines:

pytest tests/test_keypoint_head.py -v

29 passed in 2.28s

Test coverage includes:

  • KeypointHead module (forward pass, gradient flow, reference boxes, custom keypoints)
  • COCO constants validation (names, skeleton, sigmas, flip pairs)
  • Config classes (RFDETRPoseConfig, KeypointTrainConfig)
  • Training data pipeline (keypoint extraction, hflip/crop transforms)
  • Loss functions (L1, BCE visibility, OKS)
  • Inference pipeline (predict() returns keypoints, PostProcess, coordinate scaling, visibility sigmoid)

Any specific deployment considerations

  • No pretrained pose weights yet - Users must fine-tune on a keypoint dataset (e.g., COCO-Pose). The model loads detection weights by default and the keypoint head is learned during training.
  • Fully optional - The KeypointHead import is conditional; existing detection/segmentation workflows are unaffected.
  • Memory - Adds minimal overhead (~1-2% parameters) when keypoint_head=True.

Docs

  • Added docs/learn/run/pose.md - Complete usage guide for pose estimation

  • Added docs/reference/pose.md - API reference for RFDETRPose

  • Updated docs/learn/train/index.md - Added pose tabs to all training examples

  • Updated mkdocs.yaml - Added navigation entries for pose documentation


Files Changed

File Change
rfdetr/models/keypoint_head.py New - KeypointHead class with coordinate/visibility MLPs, COCO constants
rfdetr/config.py Added RFDETRPoseConfig, KeypointTrainConfig, keypoint fields to ModelConfig
rfdetr/models/lwdetr.py Integrated keypoint head, added loss_keypoints(), updated PostProcess
rfdetr/datasets/coco.py Added keypoint annotation parsing in ConvertCoco
rfdetr/datasets/transforms.py Updated crop() and hflip() for keypoint transformations
rfdetr/detr.py Added RFDETRPose class, updated predict() for keypoints
rfdetr/init.py Exported RFDETRPose
rfdetr/engine.py Added COCO keypoint evaluation support
tests/test_keypoint_head.py New - 29 tests for training and inference
tests/conftest.py New - pytest configuration
pyproject.toml Added pytest configuration
docs/learn/run/pose.md New - Pose usage documentation
docs/reference/pose.md New - RFDETRPose API reference
docs/learn/train/index.md Added pose training examples
mkdocs.yaml Added pose navigation entries

Usage / Import Example

from rfdetr import RFDETRPose

Training

# Select a model variant
model = RFDETRPose()
model = RFDETRPoseNano()
model = RFDETRPoseSmall()
model = RFDETRPoseMedium()
model = RFDETRPoseLarge()

model.train(dataset_dir="path/to/coco-pose", epochs=50)

Inference

model = RFDETRPose(pretrain_weights="output/checkpoint_best_total.pth")
detections = model.predict("image.jpg", threshold=0.5)

# Access keypoints

# Keypoints: [N, K, 3] where K=num_keypoints, 3 = (x, y, visibility)
keypoints = detections.data["keypoints"]

# Visibility follows COCO format: 0 = not visible, 2 = visible
# For raw confidence scores (0.0-1.0):
confidence = detections.data["keypoints_confidence"]
keypoints = detections.data["keypoints"]

Fix: Category ID Mapping for COCO Datasets

Problem

#330
#349
#413

When training on Roboflow or custom COCO datasets where category_id starts at 1 (or has gaps), the model would crash with a CUDA index out of bounds error. This happened because:

  • Roboflow exports use 1-indexed category IDs (e.g., [1, 2, 3])
  • The model expects 0-indexed class labels (e.g., [0, 1, 2])
  • With num_classes=3 and category_id=3, accessing index 3 in a size-3 tensor fails

Additionally, COCO evaluation was returning near-zero mAP scores because predictions used 0-indexed labels but COCO evaluation expected original category IDs.

Solution

Added automatic bidirectional category ID mapping:

Training (coco.py):

cat_ids = sorted(self.coco.getCatIds())
cat_id_to_continuous = {cat_id: i for i, cat_id in enumerate(cat_ids)}
# [1,2,3] → {1:0, 2:1, 3:2}

Evaluation (coco_eval.py):

continuous_to_cat_id = {i: cat_id for i, cat_id in enumerate(cat_ids)}
# {0:1, 1:2, 2:3} → converts predictions back for COCO metrics

How It Works

Dataset category_ids Training mapping Eval reverse mapping
[0, 1, 2] {0:0, 1:1, 2:2} (identity) {0:0, 1:1, 2:2} (identity)
[1, 2, 3] {1:0, 2:1, 3:2} {0:1, 1:2, 2:3}
[1, 5, 10] {1:0, 5:1, 10:2} {0:1, 1:5, 2:10}

Backwards Compatibility

  • 0-indexed datasets: Identity mapping, behaves exactly as before
  • 1-indexed datasets: Now works correctly instead of crashing
  • Prediction: Returns 0-indexed labels for direct class_names indexing

Files Changed

  • rfdetr/datasets/coco.py - Added cat_id_to_continuous mapping in data loading
  • rfdetr/datasets/coco_eval.py - Added continuous_to_cat_id reverse mapping for evaluation
  • docs/learn/train/index.md - Added documentation for category ID handling

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Michael Mohamed seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@michaelmohamed michaelmohamed marked this pull request as draft December 31, 2025 15:48
@michaelmohamed michaelmohamed marked this pull request as ready for review January 2, 2026 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants