This microservice is part of the Feed.UVL Project and is responsible for the automatic classification of online feedback sentences based on their relevance for requirements engineering.
Each sentence is classified as either "Informative" or "Non-Informative", using a fine-tuned BERT model. The model builds on the pre-trained BERT-base-uncased and has been fine-tuned specifically for this task using Hugging Face Transformers.
Five different runs are executed, where models are fine-tuned on different datasets and dataset combinations. The models and their evaluation results are stored in MLflow.
The following datasets are used for fine-tuning and evaluation:
- P2-Golden dataset: Contains 1,242 app review sentences from 1.
- Manual labeled Komoot dataset: Consists of 2,199 app review sentences for the app "Komoot".
Five different training methods are used:
- Training and testing using only the P2-Golden dataset.
- Training and testing using only the Komoot dataset.
- Training and testing by combining both the P2-Golden and Komoot datasets.
- Training with the P2-Golden dataset and testing with the Komoot dataset.
- Training with the Komoot dataset and testing with the P2-Golden dataset.
The trained models were evaluated using 5-fold cross-validation. Metrics such as precision, recall, and F1-score were calculated for each dataset combination. Additionally, a confusion matrix for each model was calculated.
The evaluation results can be found in the evaluation package.
The microservice provides the following functionalities:
- Service status: Get the current status of the microservice.
- Annotation creation: Automatically generate annotations and create a new dataset containing only informative app review sentences.
- Hugging Face Transformers
- Docker
- Flask
- MLflow
- Python 3.11+
This microservice is part of a larger, interconnected system and was developed for a real-world application in a multi-service architecture.
Due to data protection and infrastructure constraints:
- Datasets are not included
- Trained models in MLflow are not publicly accessible
- Other required microservices – such as services for data crawling, annotation initialization, and dataset persistence - are not included in this repository
You can still review:
- The architecture and design of this microservice
- The training pipeline and annotation/dataset creation pipeline
- The evaluation results
- The Docker setup and local configuration logic
- Running MLflow server
- Existing model in MLflow
- Adjusted experiment parameters depending on your model choice in the notebook "ComponentRelevanceClassifierServiceSetup"
-
=Python 3.11
docker build -t <CONTAINER_NAME> -f "./Dockerfile" \
--build-arg mlflow_tracking_username=XXXXXX \
--build-arg mlflow_tracking_password=XXXXXX \
--build-arg mlflow_tracking_uri=XXXXXX .
docker run -p 9698:9698 --name <CONTAINER_NAME>→ Replace XXXXXX with your MLflow credentials
git clone https://github.com/dkeyGit/relevance-classifier.git
cd relevance-classifier...
1. Install Miniconda (if not already installed):
Miniconda Installation Guide
2. Create and activate the environment:
conda create -n rc python=3.11
conda activate rc3. Configure local MLflow environment variables:
cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
echo \#\!/bin/bash >> ./etc/conda/activate.d/env_vars.sh
echo "export MLFLOW_TRACKING_USERNAME=" >> ./etc/conda/activate.d/env_vars.sh
echo "export MLFLOW_TRACKING_PASSWORD=" >> ./etc/conda/activate.d/env_vars.sh
echo "export MLFLOW_TRACKING_URI='http://127.0.0.1:5000'" >> ./etc/conda/activate.d/env_vars.sh
echo \#\!/bin/bash >> ./etc/conda/deactivate.d/env_vars.sh
echo "unset MLFLOW_TRACKING_USERNAME" >> ./etc/conda/deactivate.d/env_vars.sh
echo "unset MLFLOW_TRACKING_PASSWORD" >> ./etc/conda/deactivate.d/env_vars.sh
echo "unset MLFLOW_TRACKING_URI" >> ./etc/conda/deactivate.d/env_vars.sh
conda deactivate
conda activate rccd <THE_BASE_DIRECTORY_FOR_MLFLOW>
mlflow servercd <THE_BASE_DIRECTORY_FOR_THE_relevance-classifier_SERVICE>
pip install -e .If there is no existing relevance-classifier model on MLflow, train the models:
Start the notebook "ComponentRelevanceClassifierServiceTraining"
Load the existing model from MLflow, you want to use for the relevance classification:
Start the notebook "ComponentRelevanceClassifierServiceSetup"
→ in this notebook, you have to adjust the experiment parameters depending on your model choice
./start.shFootnotes
-
van Vliet, M., Groen, E., Dalpiaz, F., Brinkkemper, S.: Crowd-annotation results: Identifying and classifying user requirements in online feedback (2020), https://doi.org/10.5281/zenodo.3754721, Zenodo ↩