-
Notifications
You must be signed in to change notification settings - Fork 579
Fix: incorrect num_classes used in RFDETR.train_from_config #356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
I have read the CLA Document and I sign the CLA. |
|
I have to admit I am a bit confused by this solution. For example, in our case below, json For example, in our case below, json But in reality, the number of classes in both cases is 3. For me, as a user of similar models, this feels irrational. |
|
@SkalskiP Actually, the
I agree with you that it would be more efficient to have only three classes in both cases but this would screw up the whole training logic as samples are labeled by their id in the loss function. In the case of your second example with the referee having id 99, let's assume your model only has 3+1 heads and it correctly predicts the last logit (index 3) namely the referee. I believe it would get the wrong loss as the "label" is set to 99 instead of index 3 because as far as I know the coco api uses ids and not indices (correct me if I'm wrong). So if you wanted to optimise this whole thing you would have to rewrite the training logic but I believe prior authors didn't bother which is why all of the models have 90+ detection heads even though COCO only has 80 classes. |
Description
TL;DR
The current framework handles the number of classes in a dataset incorrectly as discussed in the following issues:
#330
#51
In Detail:
The rf-detr framework calculates the number of classes as the number of classes in the dataset and reinitializes the head with this number in case it doesn't match the model's num_classes.
There are several issues with this:
First of all, the head should be of size
num_classes+1to account for the fact that COCO datasets are 1 indexed. In the original DETR the head was even of size 92 because they add another +1 for the "no-object" class at the very end. This has been discussed in facebookresearch/detr#108. There they note that in theory this is not necessary to do this because the logit with index 0 could also act as the "no-object" class. This is exactly why RF-DETR's head has a size of 91, meaning the 0 index class acts as the "no-object" class. (As far as I understand this is no problem as since Deformable DETR they stopped using a special weight in the cross_entropy loss for the "no-object" class which was at index -1 for DETR.)So why does this not cause any problems for datasets downloaded from ROBOFLOW?
In roboflow datasets indexing starts at 0 and the 0th class is in fact no actual class it is the superclass of all other classes.
Here an example from the soccer players.v2 dataset:
If you inspect it on ROBOFLOW https://universe.roboflow.com/roboflow-100/soccer-players-5fuqs it indeed only has 3 classes and not 4 so the first class is really just a dummy class. This is why by accident the current implementation works as
len(anns["categories"]) = 4 = number_of_actual_classes+1However, this will not work for any 1indexed dataset such as the original COCO dataset as it doesn't have this dummy class in the beginning.
Another issue is that
num_classesshould not be the number of classes in the dataset but rather themax_idbecause there might be some "holes". This is in fact the case for the original COCO dataset. It actually has only 80 classes in total and not 90 which is it'smax_idbecause for some reason some indices are missing.I can only suspect the original RF-DETR models haven't been trained with this pipeline because this would have caused the head to be reinitialized with 80 instead of 91 logits as this is the
len(anns["categories"])in the case of COCO.The fix accounts for both issues and will still work with ROBOFLOW datasets. I set the num_classes to the
max_idwhich is 3 in the above example. Then I reinitialize the head with sizenum_classes+1to account for both the "no-object" logit in the 0th index and the fact that labels range from 1 tonum_classes.Edit: I just noticed this is the same fix as proposed by #330
Type of change
Please delete options that are not relevant.
How has this change been tested, please provide a testcase or example of how you tested the change?
Here is how I veryfied the problem:
I downloaded the soccer players.v2 dataset from roboflow as mentioned above.
As the first category is only a supercategory there are only 3 actual categories.
Roboflow also states that there are 3 categories https://universe.roboflow.com/roboflow-100/soccer-players-5fuqs
When I train the model on the dataset as is:
It reinitializes the head to size 4 which is correct as 4=3+1=num_classes+1 but only because of the dummy class in the dataset.
Now, to test what happens if I have a 1indexed dataset I removed the dummy class. Now it reinitializes it to only 3 logits and I get an "index out of bounds" error as expected as the "referee" category with id 3 coresponds to the 3rd index which doesn't exist.
After the changes this is handeled correctly and the head get initialized with 4 logits and it prints the correct number of classes (3) in the reinitialization warning.