Skip to content

Agentar-Scale-SQL is a novel framework that leverages scalable computation to significantly improve Text-to-SQL performance.

License

Notifications You must be signed in to change notification settings

antgroup/Agentar-Scale-SQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Agentar-Scale-SQL: Advancing Text-to-SQL through Orchestrated Test-Time Scaling

Product Paper Leaderboard Hugging Face ModelScope

πŸ“ Introduction

Agentar-Scale-SQL is a novel framework that leverages scalable computation to significantly improve Text-to-SQL performance on challenging benchmarks. By implementing an Orchestrated Test-Time Scaling strategy, our framework synergistically combines three distinct perspectives to bridge the gap between state-of-the-art models and human expert performance.

framework

Figure 1: The proposed Agentar-Scale-SQL framework.

⚑️ Performance

Methods EX (Dev) EX (Test) R-VES (%)
Agentar-Scale-SQL (Ours) 74.90 81.67 77.00
AskData + GPT-4o 76.14 80.88 76.24
LongData-SQL 74.32 77.53 71.89
CHASE-SQL + Gemini 74.90 76.02 69.94
JoyDataAgent-SQL 74.25 75.85 70.16
TCDataAgent-SQL 74.12 75.74 -
Contextual-SQL 73.50 75.63 70.02
XiYan-SQL 73.34 75.63 71.41

πŸŽ‰ News

  • πŸš€ 2025.11.27: We are excited to release Agentar-Scale-SQL-Generation-32B on Hugging Face and ModelScope! Simultaneously, we have open-sourced the code for the Light Schema Engine and the Offline Data Preprocessing Pipeline!
  • 🎁 2025.09.30: Our paper is available on arXiv.
  • πŸ† 2025.09.25: We are proud to announce that we have achieved #1 Rank on the official BIRD leaderboard with 81.67% execution accuracy!

πŸ—ΊοΈ Release Roadmap

We are committed to continuously improving Agentar-Scale-SQL. Here is our plan for upcoming features and releases.

  • Paper
    • Publish the paper on arXiv.
  • Model Releases
    • Release Agentar-Scale-SQL-Generation-32B on Hugging Face and ModelScope.
    • Release Agentar-Scale-SQL-Selection-32B on Hugging Face and ModelScope.
  • Code Releases
    • Release the code for the light schema engine.
    • Release the code for the offline data preprocessing pipeline.
    • Release the code for task understanding and generating SQL candidates with closed-source models.
    • Release the code for generating SQL candidates with the fine-tuned model.
    • Release the code for the SQL selection module.

πŸ“‚ Directory Structure

Agentar-Scale-SQL/
β”œβ”€β”€ ScaleSQL/                     # Core source code directory
β”‚   └── workflows/                # Main workflow scripts
β”‚       └── config/               # Configuration files
β”œβ”€β”€ ddl_schema.sh
β”œβ”€β”€ requirements.txt              # Dependency list
β”œβ”€β”€ .env                          # Environment variable
β”œβ”€β”€ .env.example                  # Environment variable template
β”œβ”€β”€ .gitignore
β”œβ”€β”€ README.md                     # Current document
β”œβ”€β”€ nltk_data.zip                 # For ddl schema generation

πŸ“š Usage

1. Installation and Environment Settings

1.1 Create Virtual Environment and Install Python Dependencies

conda create -n ScaleSQL python=3.10
conda activate ScaleSQL

1.2 Install PyTorch and Core Dependencies

# Install PyTorch (CUDA 12.1)
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121

1.3 Install Project Dependencies

pip install -r requirements.txt

1.4 Install vLLM (for Inference Acceleration)

pip install https://github.com/vllm-project/vllm/releases/download/v0.8.5.post1/vllm-0.8.5.post1+cu121-cp38-abi3-manylinux1_x86_64.whl

1.5 Download Embedding Model

modelscope download --model sentence-transformers/all-MiniLM-L6-v2 --local_dir ./ScaleSQL/model/all-MiniLM-L6-v2

2. Data Preparation

2.1 Configure Paths

Modify the configuration file: .ScaleSQL/workflows/config/pipeline_config.yaml. Note that, we need column meaning file in the evaluation. You can find the file in TA-SQL.

dataset_folder: /temp/bird_test  # Change to the actual folder
column_meaning_path: /your_path/column_meaning.json # Change to the actual path

3. Preprocessing Pipeline


3.1 Generate Light Schema

python -m ScaleSQL.workflows.schema_generation --evaluation_type test

Output example: .ScaleSQL/dataset/bird_test_light_schema.json


3.2 Process Training Set Examples and Write to Vector Database

ANONYMIZED_TELEMETRY=False python -m ScaleSQL.workflows.train_skeleton_process

Output path: /tmp/ScaleSQL/chroma/bird_train_skeleton


3.3 Process Database Cell Values and Write to Vector Database

ANONYMIZED_TELEMETRY=False python -m ScaleSQL.workflows.database_cell_process --evaluation_type test

Output path: /tmp/ScaleSQL/chroma/bird_test


3.4 Build BM25 Index (Content-Based) and Generate DDL Schema (Requires Java Environment)

bash ddl_schema.sh

Output example: .ScaleSQL/dataset/bird_test_ddl_schema.json


πŸ“¦ Try Our Product

Unlock the power of your business data with natural language. We are excited to introduce Data Agent, our cutting-edge ChatBI product designed to transform complex data into clear, conversational insights.

Simply ask questions in plain English, and let Data Agent handle the complex queries for you. No code, no steep learning curveβ€”just instant answers.

  • Official Website: For a detailed overview, features, and use cases, please visit our product page: https://antdigital.com/products/DataAgent

  • Early Access & Feedback: If you are interested in trying our product and providing valuable feedback, please feel free to contact us.

dingding

Figure 2: The contact information.

πŸ“Ž Citation

@misc{wang2025agentarscalesqladvancingtexttosqlorchestrated,
      title={Agentar-Scale-SQL: Advancing Text-to-SQL through Orchestrated Test-Time Scaling}, 
      author={Pengfei Wang and Baolin Sun and Xuemei Dong and Yaxun Dai and Hongwei Yuan and Mengdie Chu and Yingqi Gao and Xiang Qi and Peng Zhang and Ying Yan},
      year={2025},
      eprint={2509.24403},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.24403}, 
}

About

Agentar-Scale-SQL is a novel framework that leverages scalable computation to significantly improve Text-to-SQL performance.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published