relevance-classifier

Description

This microservice is part of the Feed.UVL Project and is responsible for the automatic classification of online feedback sentences based on their relevance for requirements engineering.

Each sentence is classified as either "Informative" or "Non-Informative", using a fine-tuned BERT model. The model builds on the pre-trained BERT-base-uncased and has been fine-tuned specifically for this task using Hugging Face Transformers.

Five different runs are executed, where models are fine-tuned on different datasets and dataset combinations. The models and their evaluation results are stored in MLflow.

Datasets

The following datasets are used for fine-tuning and evaluation:

P2-Golden dataset: Contains 1,242 app review sentences from ¹.
Manual labeled Komoot dataset: Consists of 2,199 app review sentences for the app "Komoot".

Training Methods

Five different training methods are used:

Training and testing using only the P2-Golden dataset.
Training and testing using only the Komoot dataset.
Training and testing by combining both the P2-Golden and Komoot datasets.
Training with the P2-Golden dataset and testing with the Komoot dataset.
Training with the Komoot dataset and testing with the P2-Golden dataset.

Evaluation

The trained models were evaluated using 5-fold cross-validation. Metrics such as precision, recall, and F1-score were calculated for each dataset combination. Additionally, a confusion matrix for each model was calculated. The evaluation results can be found in the evaluation package.

API Methods

The microservice provides the following functionalities:

Service status: Get the current status of the microservice.
Annotation creation: Automatically generate annotations and create a new dataset containing only informative app review sentences.

Technologies Used

Hugging Face Transformers
Docker
Flask
MLflow
Python 3.11+

Reproducibility Notice

This microservice is part of a larger, interconnected system and was developed for a real-world application in a multi-service architecture.

Due to data protection and infrastructure constraints:

Datasets are not included
Trained models in MLflow are not publicly accessible
Other required microservices – such as services for data crawling, annotation initialization, and dataset persistence - are not included in this repository

You can still review:

The architecture and design of this microservice
The training pipeline and annotation/dataset creation pipeline
The evaluation results
The Docker setup and local configuration logic

Requirements

Runtime Dependencies

Running MLflow server
Existing model in MLflow
Adjusted experiment parameters depending on your model choice in the notebook "ComponentRelevanceClassifierServiceSetup"

Software

=Python 3.11

Getting Started – as a Containerized Microservice

docker build -t <CONTAINER_NAME> -f "./Dockerfile" \
  --build-arg mlflow_tracking_username=XXXXXX \
  --build-arg mlflow_tracking_password=XXXXXX \
  --build-arg mlflow_tracking_uri=XXXXXX .

docker run -p 9698:9698 --name <CONTAINER_NAME>

→ Replace XXXXXX with your MLflow credentials

Getting Started for Local Testing

1. Clone the repository

git clone https://github.com/dkeyGit/relevance-classifier.git
cd relevance-classifier

2. (Optional) Create a Virtual Environment

Option A: Using `venv`

...

Option B: Using `conda`

1. Install Miniconda (if not already installed):
Miniconda Installation Guide

2. Create and activate the environment:

conda create -n rc python=3.11
conda activate rc

3. Configure local MLflow environment variables:

cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
echo \#\!/bin/bash >> ./etc/conda/activate.d/env_vars.sh
echo "export MLFLOW_TRACKING_USERNAME=" >> ./etc/conda/activate.d/env_vars.sh
echo "export MLFLOW_TRACKING_PASSWORD=" >> ./etc/conda/activate.d/env_vars.sh
echo "export MLFLOW_TRACKING_URI='http://127.0.0.1:5000'" >> ./etc/conda/activate.d/env_vars.sh

echo \#\!/bin/bash >> ./etc/conda/deactivate.d/env_vars.sh
echo "unset MLFLOW_TRACKING_USERNAME" >> ./etc/conda/deactivate.d/env_vars.sh
echo "unset MLFLOW_TRACKING_PASSWORD" >> ./etc/conda/deactivate.d/env_vars.sh
echo "unset MLFLOW_TRACKING_URI" >> ./etc/conda/deactivate.d/env_vars.sh

conda deactivate
conda activate rc

3. Start the MLflow Server (locally)

cd <THE_BASE_DIRECTORY_FOR_MLFLOW>
mlflow server

4. Set Up the Development Environment

cd <THE_BASE_DIRECTORY_FOR_THE_relevance-classifier_SERVICE>
pip install -e .

5. Train or Load the relevance-classifier Model

If there is no existing relevance-classifier model on MLflow, train the models:
Start the notebook "ComponentRelevanceClassifierServiceTraining"

Load the existing model from MLflow, you want to use for the relevance classification:
Start the notebook "ComponentRelevanceClassifierServiceSetup" → in this notebook, you have to adjust the experiment parameters depending on your model choice

6. Start the relevance-classifier service

./start.sh

References

van Vliet, M., Groen, E., Dalpiaz, F., Brinkkemper, S.: Crowd-annotation results: Identifying and classifying user requirements in online feedback (2020), https://doi.org/10.5281/zenodo.3754721, Zenodo ↩

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
evaluation		evaluation
experiments		experiments
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
ComponentRelevanceClassifierServiceSetup.ipynb		ComponentRelevanceClassifierServiceSetup.ipynb
ComponentRelevanceClassifierServiceTraining.ipynb		ComponentRelevanceClassifierServiceTraining.ipynb
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
package-list.txt		package-list.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

relevance-classifier

Description

Datasets

Training Methods

Evaluation

API Methods

Technologies Used

Reproducibility Notice

Requirements

Runtime Dependencies

Software

Getting Started – as a Containerized Microservice

Getting Started for Local Testing

1. Clone the repository

2. (Optional) Create a Virtual Environment

Option A: Using `venv`

Option B: Using `conda`

3. Start the MLflow Server (locally)

4. Set Up the Development Environment

5. Train or Load the relevance-classifier Model

6. Start the relevance-classifier service

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

dkeyGit/Relevance-Classifier

Folders and files

Latest commit

History

Repository files navigation

relevance-classifier

Description

Datasets

Training Methods

Evaluation

API Methods

Technologies Used

Reproducibility Notice

Requirements

Runtime Dependencies

Software

Getting Started – as a Containerized Microservice

Getting Started for Local Testing

1. Clone the repository

2. (Optional) Create a Virtual Environment

Option A: Using venv

Option B: Using conda

3. Start the MLflow Server (locally)

4. Set Up the Development Environment

5. Train or Load the relevance-classifier Model

6. Start the relevance-classifier service

References

Footnotes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Option A: Using `venv`

Option B: Using `conda`

Packages