-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Hello, It’s an honor to read your paper. I’m a student working on multimodal dialogue emotion recognition models. I understand that in the ERC (Emotion Recognition in Conversations) field, many predecessors have already provided processed features for text, audio, and visual modalities. However, for my thesis, I would like to study feature extraction myself. For example, most studies fine-tune BERT or RoBERTa and, after obtaining the fine-tuned model, perform a single forward pass to obtain textual feature vectors for the entire dataset (training, validation, and test). Your proposed EmoBERTa model has also been widely used as a text feature extractor, such as in “A MoE Multimodal Graph Attention Network Framework for Multimodal Emotion Recognition” and “MultiEMO: An Attention-Based Correlation-Aware Multimodal Fusion Framework for Emotion Recognition in Conversations.”
I would like to ask you a question: if the final saved EmoBERTa model achieves over 80% accuracy on the dialogue training set but only over 60% accuracy on the test set, does this count as overfitting? And if so, would using such a model to extract feature vectors for the entire dataset have an impact on the subsequent training of a dialogue emotion recognition model?