Skip to content

mlrsrch/hierarchy_ensembles

Repository files navigation

fig_flowchart_small

HiGEC
Hierarchy Generation and Extended Classification Framework

Python
License
OpenML

HiGEC is a Python framework for enhancing multi-class classification through automated hierarchy generation (HG) and flexible hierarchy exploitation (HE) strategies. It supports hybrid approaches that integrate hierarchical and flat classifier outputs.


🔧 Installation
git clone https://github.com/alagoz/higec.git
cd higec
pip install -r requirements.txt

Dependencies:
numpy scipy matplotlib scikit-learn scikit-learn-extra proglearn xgboost lightgbm


⚡ Key Features

Automatic hierarchy generation from flat class labels

🧩 Hybrid HE+F classification strategies

🖇️ Support for any scikit-learn compatible classifier

📊 Benchmark-ready with OpenML integration

🌳 Visualization tools for hierarchy inspection


🚀 Quick Start

Run the example:

python run_higec_example.py

Pipeline:

  1. Downloads OpenML dataset

  2. Trains flat classifier baseline

  3. Generates class hierarchy

  4. Evaluates hierarchical approach


🛠 Core Components
File Purpose
HG.py Hierarchy generation
HE.py Hierarchy exploitation
hdc.py Divisive clustering
utils.py Data handling & visualization

🧪 Customization

Adjust parameters in 'run_higec_example.py':

DID = 46264                       # OpenML dataset ID
HiGEC = 'CCM[HAC|COMPLETE]-LCPN[ETC]+F[XGB]'  # HG + HE scheme
CLF_NAME_FC = 'RF'                # Flat classifier

Available classifiers: RF, XGB, ETC, LGB.


📈 Example Output
Extended Linkage Table:

node_id:0, node_type:parent, subsets:[[0], [1,2,3,4]], branch_ids:[0,7], parent_id:None
node_id:1, node_type:parent, subsets:[[3,4],[1,2]], branch_ids:[5,6], parent_id:0
Performance Comparison:

- Flat Classification (RF) (f1): 0.3517 in 0.4309 seconds
- HiGEC: CCM[HAC|COMPLETE]-LCPN[ETC]+F[XGB] (f1): 0.3700 in 1.1853 seconds

Generated Hierarchy:
example_hierarchy


📊 Benchmark Results

HiGEC was evaluated on 100 multi-class tabular datasets, showing consistent F1-score gains over flat classification (FC), particularly with hybrid HE+F configurations.


Mean F1 Comparison (HiGEC vs FC)

fig_mcm_higec_vs_fc

Mean F1 Scores & Standard Deviations

fig_table


Download raw results (F1 scores per dataset):

  • f1_scores_fc_vs_higec.csv – Contains per-dataset F1-scores of FC and selected 9 HiGEC algorithms.
  • Columns: index, short, RF, XGB, ETC, LGB, LCN[XGB]+, LCPN[ETC]+F[XGB], LCPN[RF]+F[XGB], LCPN[XGB]+F[RF], LCL[XGB]+F[RF], LCPN[RF]+F[RF], LCL[RF]+F[XGB], LCPN[LGB]+F[XGB], LCPN[XGB]+F[XGB]

Download mean performance metrics for all FC algorithms:

  • fc_mean_performance.csv – Contains mean scores across datasets for each FC algorithm.
  • Columns: index, short, mean_f1_xgb, mean_f1_catb, ... , mean_acc_xgb, mean_acc_catb, ... , mean_auc_xgb, mean_auc_catb, ... , total_dur_xgb, total_dur_catb, ...

These CSV files allow full reproducibility and further statistical analysis of HiGEC’s performance compared to FC.


📖 References

For more details on methodology, datasets, and evaluations, see the HiGEC GitHub repository.