CV team project (A+)
- 2023.10.10 ~ 2023.11.20
- Improve "fine-grained classification" accuracy by modifying the model
- "Fine-grained dataset": refers to a dataset where the categories or classes are finely divided or detailed (object recognition, image classification, segementation, attribute recognition)
- Best Accuracy: 92.9054%
- Test Loss: 0.3288, Accuracy: 90.2685%
- Elasped Time: 22m 30s
1. Augmentation - Geometric Transformation + Visual Corruptions)
- Geometric Transformation: Movement, Size, Rotation, Symmetry Transformation
- -> Apply vertical and horizontal flip (left and right inversion, upper and lower inversion), rotate
- Visual Corruptions:
- Images are exposed to various environmental noises, and images with noise affect model performance.
- We want to increase the robustness of our model by using visual corruption, which is a disturbance such as noise and blur.
- -> gaussian noise, contrast, brightness
2. Optimizer - AdamP:
- an optimizer that inhibits excessive weight norm growth by eliminating gradient components parallel to the weight direction caused by momentum through projection.
3. Hyperparameter(BATCH_SIZE, EPOCH, lr):
- The smaller the batch size, the better the performance
- The more you learn by repeating the epoxy, the better the performance
- The lower the learning rate, the better the performance
4. Scheduler: get_cosine_schedule_with_warmup
- After a warm-up period that increases linearly by the specified value (three times the train_loader)
- In the optimizer, the schedule is generated with a learning rate that decreases with cosine function values.
- -> This allows you to get out of saddle point quickly.
- Congestion segments that occur in the middle of learning can also be quickly removed
- Maximize model generalization performance
5.Label Smoothing
- A method to reduce overconfidence in deep learning predictions by softening Hard label (which consists of 1 correct answer index with one-hot encoded vector and 0 for the rest).
Efficientnet b0 model
-
There are three ways to improve performance: "Compound Scaling",
- Increase the depth of the network (increase the number of layers)
- Increase channel width (increase the number of filters)
- Increase the resolution of the input image
-
EfficientNet is a model that can find the best combination for three methods, using all three scaling, performing well with less FLOPS (calculated amount) than conventional models.
[FINAL]
- Test Loss: 0.3447,Accuracy: 94.6309%
- Best Accuracy: 94.5945945945946
- Elapsed Time: 2h, 12m, 26s
- time: 2h, 12m, 26s




