An Algorithmic Approach to Solve the Grouping Problem

Overview

This repository presents algorithmic solutions to the grouping problem, a combinatorial optimization challenge. The objective is to partition a set of objects into optimal groups while adhering to predefined constraints, such as fixed group sizes and must-link/cannot-link conditions. These constraints enforce mandatory co-membership or exclusion between specific individuals. This work was developed as part of a bachelor's thesis.

Algorithms

In this repository, three algorithms are implemented that demonstrates different approaches to tackling the grouping problem:

Ant Colony Optimization: A probabilistic optimization algorithm.
Custom PCK-Mean: A customized PCKMeans clustering algorithm
Occurrence Ranking: Algorithm defined from scratch

Features

Reproducible Evaluations: Scripts and data are provided to replicate the evaluations conducted in the thesis
Interactive Prototype: A minimal prototype using Streamlit allows users to form groups with their own datasets.

Getting Started

Clone Repository:

git clone https://gitlab.inf.unibe.ch/SEG/students/theses/viola-camille-andrin-mueller/agat.git

Install Dependencies:

pip install -r requirements.txt

Build Cython Extensions: Run the following command to build the Cython extension for occ_optimization:

cd algorithms/randomness 

python setup.py build_ext --inplace

Run the following command to build the Cython extension for optimized_distances:

cd algorithms_comparison

python setup.py build_ext --inplace

Run Streamlit: Run the following command to run the Streamlit App:

python main.py home

Run Evaluation:

Run the following command to run the evaluation:

python main.py evaluation

Evaluation Settings

The evaluations can be configured in the evaluation.py file, by addapting following inputs:

goup_sizes = {2,3,4} #line 10
selected_df = 0 #line 13
single_run = True #line 16

group_sizes: List of group sizes considered in the evaluation.
single_run: if True each algorithm is only run once, else 10 iterations for each configurations is run
selected_df: selects only df_ev_list[selected_df] from df_ev_list if single_run is set to True

CSV File Format

The CSV file for Streamlit must have a first column named "Name". After that, features can be individually selected from the table below:

FeatureID	Feature Type	Feature Class	# Answers	Example Input
hom	discrete numerical, interval	homogeneous	single	1
het	discrete numerical, interval	heterogeneous	single	1
hot_hom	discrete categorical, nominal	homogeneous	single	[1,0,0]
hot_het	discrete categorical, nominal	heterogeneous	single	[1,0,0]
multi_hot_hom	discrete categorical, nominal	homogeneous	multiple	[1,1,0]
multi_hot_hom	discrete categorical, nominal	heterogeneous	multiple	[1,1,0]

Important:

Column names must match the FeatureID
If a feature appears multiple times, it must be named in the format FeatureID_num (e.g., hom_1, hom_2)
The input for each feature must match the format of the example for that specific feature
All list inputs must be in integer format (e.g. 1 instead of 1.0)

The directory example_csv_files contains two example files that can be uploaded via streamlit.

Authors

Viola Meier
Andrin Müller

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Data		Data
algorithms		algorithms
algorithms_comparison		algorithms_comparison
distance_matrix		distance_matrix
example_csv_files		example_csv_files
gui		gui
AGAT_BScThesis.pdf		AGAT_BScThesis.pdf
README.md		README.md
config.json		config.json
evaluation.py		evaluation.py
home.py		home.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An Algorithmic Approach to Solve the Grouping Problem

Overview

Algorithms

Features

Getting Started

Evaluation Settings

CSV File Format

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Languages

andrinM/agat

Folders and files

Latest commit

History

Repository files navigation

An Algorithmic Approach to Solve the Grouping Problem

Overview

Algorithms

Features

Getting Started

Evaluation Settings

CSV File Format

Authors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages