CreditEval is a financial credit reasoning evaluation system. It provides:
- A batch evaluation pipeline (
main.py) for offline scoring of CoT outputs. - A Flask backend API (
app.py) exposing locate / evaluate / prompt-config endpoints. - A web UI (
Risk-COT-Tagger/) for meta-evaluation, prompt editing and few-shot example management.
pip install -r requirements.txtmain.py runs the full multi-dimension evaluation pipeline (accuracy, logical consistency,
compliance) over a CoT jsonl file with user metadata:
python main.py \
--cust_desc_file data/dag_4.3.2/cust_desc_file_sample.json \
--cot_file data/dag_4.3.2/cot_file_sample.jsonl \
--output results \
--max_workers 30-
Start the Flask backend (locate / evaluate / meta-evaluate):
python app.py
This exposes endpoints such as:
GET /api/prompts/<dimension>/<stage>: read prompt config fromcommon/config/*_prompt_config.yaml.PUT /api/prompts/<dimension>/<stage>: update promptcontent.POST /api/locate: single-user feature extraction (locate stage).POST /api/locate/batch: batch locate over files.POST /api/evaluate/batch: batch evaluation over files.
-
Start the front-end HTTP server for
Risk-COT-Tagger:cd Risk-COT-Tagger python start_server.pyThen open the printed URL (typically
http://localhost:8000/start.html) to access:- Meta-evaluation workbench for reviewing and curating errors.
- Prompt configuration pages for Locate and Evaluate stages.
- Few-shot example repository and instruction builder.***