Life Goal Classification using LLMs

This project classifies life goals into structured categories using large language models (LLMs) via API calls. It reads life goal data from Excel files, classifies them using a prompt aligned with a predefined codebook, and optionally evaluates the classification accuracy against manual labels.

🔄 Update: Batched Classification (batch-llm-call branch)

This branch introduces a more efficient version of the classification logic with key updates:

Use main_batched.py instead of main.py
Combine all non-empty goals per person (row) into a single LLM call, reducing token usage
Replace categories.json with system_prompt.txt, which ensures full alignment with the current codebook
Same output structure as before: one classification per goal column
Update evaluate_accuracy.py to output classification accuracy by category and by person, and to list all cases with classification differences

To run the new version:

uv run python main_batched.py

🗂️ Project Structure(Updated)

FTOLP_LLM/
├── README.md
├── data                                                # Input data folder
│   └── Final_Data_Pilot_test.xlsx                      # Excel file with life goals to classify
├── evaluate                                            # Manual labels and evaluation folder
│   ├── Final_Data_Pilot_test_ea.xlsx                   # Manually coded life goal categories
│   └── output/                                         # Output of evaluation results folder
├── evaluate_accuracy.py                                # Script to compare LLM output with manual labels
├── lifeproject                                         # Core Python package (classification logic & config)
│   ├── __init__.py                                     # Package initialization
│   ├── __pycache__/                                    # (Generated) Cache directory folder
│   ├── classifier_batched.py                           # Main classification logic using LLM for main_batched.py
│   ├── prompt_builder.py                               # System prompt for main_batched.py
│   ├── config.py                                       # LLM config management (loads .env)
│   └── llm.py                                          # LLM config dataclass and OpenAI interface
├── main_batched.py                                     # Main script to classify life goals in an Excel file using batched LLM requests
├── output/                                             # Output data folder
├── pyproject.toml                                      # Project metadata and configuration
├── requirements.in                                     # Editable dependency list
├── system_prompt.txt                                   # System prompt template for guiding LLM
├── uv.lock                                             # (Generated) Locked dependency versions
├── .env                                                # Environment variables (API key, model name)
└── .gitignore                                          # Telling Git which files or directories to ignore and exclude from version control

🛠️ How to Run the Project

1. Set up the environment

Create a virtual environment and install dependencies:

uv venv
uv pip compile requirements.in --output uv.lock
uv pip sync uv.lock

Alternatively:

uv sync

Make sure you have uv installed beforehand, installation process is as follows:

Windows:

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Linux/MacOS:

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Add your API keys

Create a .env file in the project root directory based on this template:

OPENAI_API_KEY=sk-...
LLM_MODEL=gpt-4o
LLM_PROVIDER=openai

3. Run the classification

uv run python main_batched.py

This will:

Load input Excel file (data/input_test.xlsx)
Classify goals using LLM based on categories.json
Save results to output/output_classified_<timestamp>.xlsx
Log outputs and errors to output/

4. Evaluate classification accuracy (optional)

uv run python evaluate_accuracy.py

This script compares the LLM output with manually labeled data (evaluate/input_ea.xlsx), generates a bar plot of accuracies, and outputs mismatch details.

📝 Note

The old script main.py is deprecated and will be removed in future versions.
classifier.py has been replaced by classifier_batched.py and prompt_builder.py.
The input file has been updated to Final_Data_Pilot_test.xlsx, and the evaluation input file has been updated to Final_Data_Pilot_test_ea.xlsx.
A new repository named LifeProject has been initialized on GitHub to manage core classification modules.

👩‍💻 Author

Shiyu Dong

s.dong1@uu.nl

Utrecht University | SaSR & SoDa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Life Goal Classification using LLMs

🔄 Update: Batched Classification (batch-llm-call branch)

🗂️ Project Structure(Updated)

🛠️ How to Run the Project

1. Set up the environment

2. Add your API keys

3. Run the classification

4. Evaluate classification accuracy (optional)

📝 Note

👩‍💻 Author

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
evaluate		evaluate
lifeproject		lifeproject
output		output
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
evaluate_accuracy.py		evaluate_accuracy.py
main_batched.py		main_batched.py
pyproject.toml		pyproject.toml
requirements.in		requirements.in
system_prompt.txt		system_prompt.txt
uv.lock		uv.lock

License

fqixiang/LifeProject

Folders and files

Latest commit

History

Repository files navigation

Life Goal Classification using LLMs

🔄 Update: Batched Classification (batch-llm-call branch)

🗂️ Project Structure(Updated)

🛠️ How to Run the Project

1. Set up the environment

2. Add your API keys

3. Run the classification

4. Evaluate classification accuracy (optional)

📝 Note

👩‍💻 Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages