ConSCompF: Consistency-focused Similarity Comparison Framework

Python implementation of ConSCompF - LLM similarity comparison framework that accounts for instruction consistency proposed in the original paper.

Features

Generates LLM similarity matrices and compresses them using PCA.
Can be used in few-shot scenarios.
Supports multiple input formats including lists, HF datasets, and pandas DataFrames.
Supports different return types including lists, PyTorch tensors, and pandas DataFrames.
Supports embedding caching.

Installation

The package is available on PyPI:

pip install conscompf

Usage

from conscompf import ConSCompF

conscompf = ConSCompF(quiet=True)

data: list[dict[str, list[str]]] = [
    {
        "model1": [
            "Text 1...",
            "Text 2...",
        ], 
        "model2": [
            "Text 1...",
            "Text 2...",
        ], 
    }, {
        "model1": [...],
        "model2": [...]
    }, ...
] # Or use HF dataset with a similar structure

out = conscompf(data, return_type="df") # Available return types: pt, df, list

print(out["sim_matrix"])
print(out["pca"])
print(out["consistency"])

The same minimalistic example, but with real data can be found in examples/simple.py.

More examples are available in examples directory.

For a full list of available functions and arguments use the documentation:

pydoc conscompf.ConSCompF

Build

You can build and install this package manually:

git clone https://github.com/alex-karev/conscompf
cd conscompf
python -m build .
pip install .

Citation

This project is currently contributed by Alexey Karev and Dong Xu from School of Computer Engineering and Science of Shanghai University.

If you find our work valuable, please cite:

 @article{
    Karev_Xu_2025, 
    title={ConSCompF: Consistency-focused Similarity Comparison Framework for Generative Large Language Models}, 
    volume={82}, 
    ISSN={1076-9757}, 
    DOI={10.1613/jair.1.17028},
    journal={Journal of Artificial Intelligence Research}, 
    author={Karev, Alexey and Xu, Dong}, 
    year={2025}, 
    month=mar, 
    pages={1325–1347} 
}

The original dataset used during the experiments described in the original paper is available here.

Contribution

Feel free to fork this repo and make pull requests.

Lisense

Free to use under Apacha 2.0. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
conscompf		conscompf
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConSCompF: Consistency-focused Similarity Comparison Framework

Features

Installation

Usage

Build

Citation

Contribution

Lisense

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

alex-karev/conscompf

Folders and files

Latest commit

History

Repository files navigation

ConSCompF: Consistency-focused Similarity Comparison Framework

Features

Installation

Usage

Build

Citation

Contribution

Lisense

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages