Code Generation and Evaluation Suite

This project implements a comprehensive evaluation system for testing various code-specialized Large Language Models (LLMs) on coding tasks. It supports multiple providers including OpenAI GPT-4, Anthropic Claude 3, DeepSeek Coder, Together.ai's Code Llama, and Google's Gemini.

Features

Multi-provider support for code generation evaluation
Automated test case execution
Code quality analysis using AST parsing
Performance metrics collection
Rate limiting and API error handling
Scoring system based on:
- Test case success (50%)
- Code quality (30%)
- Performance (20%)

Supported Providers

OpenAI (GPT-4)
Anthropic (Claude 3)
DeepSeek Coder
Together.ai (Code Llama)
Google (Gemini)

Setup

Clone the repository:

git clone https://github.com/abigaillhiggins/CA_test_repo.git
cd CA_test_repo

Create a virtual environment and install dependencies:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Create a .env file with your API keys:

OPENAI_API_KEY=your_key_here
ANTHROPIC_API_KEY=your_key_here
DEEPSEEK_API_KEY=your_key_here
TOGETHER_API_KEY=your_key_here
GOOGLE_API_KEY=your_key_here

Usage

Single Provider Testing

python code_evaluation.py

Multi-Provider Testing

export TEST_ALL_PROVIDERS=true
python code_evaluation.py

Configuration

Default provider and model can be configured in the .env file
Rate limiting can be adjusted via environment variables:
- RATE_LIMIT_CALLS: Number of calls allowed per period
- RATE_LIMIT_PERIOD: Time period in seconds

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
code_evaluation.py		code_evaluation.py
code_evaluation_openai.py		code_evaluation_openai.py
deepseek_evaluation.py		deepseek_evaluation.py
endpoint_test.py		endpoint_test.py
llm_evaluation.py		llm_evaluation.py
llm_providers.py		llm_providers.py
news_style_evaluation.py		news_style_evaluation.py
requirements.txt		requirements.txt
test_llm.py		test_llm.py
test_news_transformation.py		test_news_transformation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Generation and Evaluation Suite

Features

Supported Providers

Setup

Usage

Single Provider Testing

Multi-Provider Testing

Configuration

License

About

Uh oh!

Releases

Packages

Languages

abigaillhiggins/API-Model-Coding-Evaluation

Folders and files

Latest commit

History

Repository files navigation

Code Generation and Evaluation Suite

Features

Supported Providers

Setup

Usage

Single Provider Testing

Multi-Provider Testing

Configuration

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages