██████╗ ██╗ ██╗██╗ ██╗ ██╗ ██╗
██╔════╝ ██║ ██║██║ ██║ ╚██╗ ██╔╝
██║ ███╗██║ ██║██║ ██║ ╚████╔╝
██║ ██║██║ ██║██║ ██║ ╚██╔╝
╚██████╔╝╚██████╔╝███████╗███████╗ ██║
╚═════╝ ╚═════╝ ╚══════╝╚══════╝ ╚═╝
Gully is a sophisticated application that allows you to query a rich database of Indian Premier League (IPL) cricket statistics using plain English. It leverages a powerful AI agent to understand natural language, translate it into precise SQL queries, and fetch answers directly from ball-by-ball data.
Currently the backed logic is implemented with proper authentication and authorization. Frontend is yet to be built
Instead the CLI version is completely built
This applications plans to use discord like tagging (mentions) mechanism to maintain name consistency. Syntax for tagging is given below
NOTE: The dataset used is publically available. Please consider upvoting in kaggle. Use this and make more amazing project than mine 😂.
Kaggle Publically available Dataset link
I have build a dummy frontend using the backedn endpoints (without auth integrations)

This is the how the discord like tagging mechanism

And the question is asked here
As a hardcore CSK fan I want to know how many wides did Pathirana bowl (As far as I remember most matches were catch practice between dhoni and pathirana 🤣)

And there you go. He bowled 13.09% wide. NO wonder pathirana is the GOAT of wides 😂😂
Use the @ symbol to trigger the entity menu.
Whenever you want to use a player, team, venue, city, or umpire name, start typing with the @ symbol followed by the category and a hyphen.
@players-Name— Target specific batters/bowlers@teams-Name— Target specific IPL franchises@venues-Name— Target stadiums@umpires-Name— Target umpires@cities-Name— Target city name
| Key Binding | Action |
|---|---|
TAB / Arrows |
Navigate suggestions |
ENTER |
Select suggestion / Submit query |
Ctrl + D |
Exit application |
Ctrl + C |
Cancel current input |
- How many sixes did
@players-MS Dhoniscore overall? - Which matches were played at
@venues-M Chinnaswamy Stadium? - List all umpires who have officiated matches in
@cities-Bangalore.
- To exit the application, type
exitorquitor pressCtrl + D. - To view this manual again, type
helporman.
- Natural Language Queries: Ask complex questions about IPL history in plain English.
- AI-Powered SQL Agent: An intelligent agent dynamically generates SQL queries from your questions.
- RESTful API: A robust backend built with FastAPI provides scalable endpoints for chat, user management, and authentication.
- Database Management: Uses SQLAlchemy and Alembic for seamless database schema management and migrations.
- Data Processing Pipeline: Includes scripts to process raw Cricsheet JSON data into a structured database format.
- Command-Line Interface: Provides a CLI for direct interaction and testing.
- Backend: Python, FastAPI, Uvicorn
- Database: PostgreSQL, SQLAlchemy, Alembic
- AI / NLP: LLM-powered agents (framework-agnostic), custom query tagging and lexing
- Dependency Management:
uv - CLI - Prompt Toolkit(python)
Follow these steps to set up and run the project locally.
- Python 3.10+
- PostgreSQL database server
gitfor cloning the repository
-
Clone the repository:
git clone https://github.com/Astrasv/the-gully.git cd gully -
Create a virtual environment:
python -m venv .venv # Activate the environment # Windows .venv\Scripts\activate # macOS / Linux source .venv/bin/activate
-
Install dependencies using
uv:pip install uv uv pip install -r requirements.txt
-
Configure Environment Variables: Copy the example environment file and update it with your database credentials and a secret key.
cp .env.example .env
Now, edit the
.envfile with your details.
-
Set up the database schema: Navigate to the backend directory and run the Alembic migrations. This will create all the necessary tables.
cd backend alembic upgrade head cd ..
-
Prepare the IPL Data:
- Download the official IPL ball-by-ball data in JSON format from the Cricsheet Website.
- Place the downloaded JSON files into the
ipl_json/directory at the root of the project.
-
Populate the Database: Run the preprocessing and population script. This will read the JSON files, process the data, and insert it into your PostgreSQL database.
python db/populate.py
- Set up the database for IPL table: Navigate to the backend directory and run the Alembic migrations. This will create all the necessary tables. ```bash \c ipl_database
and
\i db/setup_db.sql
and
COPY ipl_ball_by_ball
FROM '/path/to/your_file.csv' (which you downloaded from my kaggle)
DELIMITER ','
CSV HEADER;
```
Once the setup is complete, you can run the agents server.
fastapi run --reload --port 8001 /agents/app/main.pyThe API will be live at http://127.0.0.1:8001, and the interactive OpenAPI documentation will be available at http://127.0.0.1:8001/docs.
After this run the backend server
fastapi run --reload --port 8000 /backend/app/main.pyThe API will be live at http://127.0.0.1:8000, and the interactive OpenAPI documentation will be available at http://127.0.0.1:8000/docs.
NOTE: If you want to only access the CLI application start just the agents server
The primary way to use Gully is through its REST API. You can use tools like curl, Postman, or any HTTP client to interact with the endpoints. The /api/chat/ endpoint is the main entry point for asking questions.
A simple CLI is available for quick tests from your terminal.
python main_cli.pyThis project is licensed under the terms of the MIT License. See the LICENSE file for details.
