Skip to content

Conversation

@timnielen
Copy link

@timnielen timnielen commented Sep 12, 2025

Description

TL;DR
The current framework handles the number of classes in a dataset incorrectly as discussed in the following issues:
#330
#51

In Detail:
The rf-detr framework calculates the number of classes as the number of classes in the dataset and reinitializes the head with this number in case it doesn't match the model's num_classes.
There are several issues with this:

First of all, the head should be of size num_classes+1 to account for the fact that COCO datasets are 1 indexed. In the original DETR the head was even of size 92 because they add another +1 for the "no-object" class at the very end. This has been discussed in facebookresearch/detr#108. There they note that in theory this is not necessary to do this because the logit with index 0 could also act as the "no-object" class. This is exactly why RF-DETR's head has a size of 91, meaning the 0 index class acts as the "no-object" class. (As far as I understand this is no problem as since Deformable DETR they stopped using a special weight in the cross_entropy loss for the "no-object" class which was at index -1 for DETR.)
So why does this not cause any problems for datasets downloaded from ROBOFLOW?
In roboflow datasets indexing starts at 0 and the 0th class is in fact no actual class it is the superclass of all other classes.
Here an example from the soccer players.v2 dataset:

[
        {
            "id": 0,
            "name": "soccer-players",
            "supercategory": "none"
        },
        {
            "id": 1,
            "name": "football",
            "supercategory": "soccer-players"
        },
        {
            "id": 2,
            "name": "player",
            "supercategory": "soccer-players"
        },
        {
            "id": 3,
            "name": "referee",
            "supercategory": "soccer-players"
        }
]

If you inspect it on ROBOFLOW https://universe.roboflow.com/roboflow-100/soccer-players-5fuqs it indeed only has 3 classes and not 4 so the first class is really just a dummy class. This is why by accident the current implementation works as len(anns["categories"]) = 4 = number_of_actual_classes+1
However, this will not work for any 1indexed dataset such as the original COCO dataset as it doesn't have this dummy class in the beginning.

Another issue is that num_classes should not be the number of classes in the dataset but rather the max_id because there might be some "holes". This is in fact the case for the original COCO dataset. It actually has only 80 classes in total and not 90 which is it's max_id because for some reason some indices are missing.

I can only suspect the original RF-DETR models haven't been trained with this pipeline because this would have caused the head to be reinitialized with 80 instead of 91 logits as this is the len(anns["categories"]) in the case of COCO.

The fix accounts for both issues and will still work with ROBOFLOW datasets. I set the num_classes to the max_id which is 3 in the above example. Then I reinitialize the head with size num_classes+1 to account for both the "no-object" logit in the 0th index and the fact that labels range from 1 to num_classes.

Edit: I just noticed this is the same fix as proposed by #330

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

Here is how I veryfied the problem:

I downloaded the soccer players.v2 dataset from roboflow as mentioned above.

As the first category is only a supercategory there are only 3 actual categories.
Roboflow also states that there are 3 categories https://universe.roboflow.com/roboflow-100/soccer-players-5fuqs

When I train the model on the dataset as is:

model = RFDETRNano()

model.train(
    dataset_dir="soccer players.v2-release.coco"
    )

It reinitializes the head to size 4 which is correct as 4=3+1=num_classes+1 but only because of the dummy class in the dataset.

Now, to test what happens if I have a 1indexed dataset I removed the dummy class. Now it reinitializes it to only 3 logits and I get an "index out of bounds" error as expected as the "referee" category with id 3 coresponds to the 3rd index which doesn't exist.

After the changes this is handeled correctly and the head get initialized with 4 logits and it prints the correct number of classes (3) in the reinitialization warning.

@CLAassistant
Copy link

CLAassistant commented Sep 12, 2025

CLA assistant check
All committers have signed the CLA.

@timnielen
Copy link
Author

I have read the CLA Document and I sign the CLA.

@SkalskiP
Copy link
Collaborator

SkalskiP commented Jan 6, 2026

I have to admit I am a bit confused by this solution.

For example, in our case below, num_classes is set to 4.

json

[
        {
            "id": 0,
            "name": "soccer-players",
            "supercategory": "none"
        },
        {
            "id": 1,
            "name": "football",
            "supercategory": "soccer-players"
        },
        {
            "id": 2,
            "name": "player",
            "supercategory": "soccer-players"
        },
        {
            "id": 3,
            "name": "referee",
            "supercategory": "soccer-players"
        }
]

For example, in our case below, num_classes is set to 100.

json

[
        {
            "id": 0,
            "name": "soccer-players",
            "supercategory": "none"
        },
        {
            "id": 1,
            "name": "football",
            "supercategory": "soccer-players"
        },
        {
            "id": 2,
            "name": "player",
            "supercategory": "soccer-players"
        },
        {
            "id": 99,
            "name": "referee",
            "supercategory": "soccer-players"
        }
]

But in reality, the number of classes in both cases is 3. For me, as a user of similar models, this feels irrational.

@timnielen
Copy link
Author

timnielen commented Jan 6, 2026

@SkalskiP Actually, the num_classes is 3 and 99 respectively i.e. the max_id and I add one detection head for the no-objectclass. This distinction is necessary as it allows for datasets that don't have this unused superclass in the beginning. Regarding them not being 3 in both cases, this is how the coco dataset should be handled as explained here:

Another issue is that num_classes should not be the number of classes in the dataset but rather the max_id because there might be some "holes". This is in fact the case for the original COCO dataset. It actually has only 80 classes in total and not 90 which is it's max_id because for some reason some indices are missing.

I agree with you that it would be more efficient to have only three classes in both cases but this would screw up the whole training logic as samples are labeled by their id in the loss function. In the case of your second example with the referee having id 99, let's assume your model only has 3+1 heads and it correctly predicts the last logit (index 3) namely the referee. I believe it would get the wrong loss as the "label" is set to 99 instead of index 3 because as far as I know the coco api uses ids and not indices (correct me if I'm wrong). So if you wanted to optimise this whole thing you would have to rewrite the training logic but I believe prior authors didn't bother which is why all of the models have 90+ detection heads even though COCO only has 80 classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants