Code Evaluator

项目简介

多语言代码执行与测试服务。支持对模型生成代码进行直接运行或依据测试用例评估。

支持数据集

HumanEval（多语言：python / javascript / typescript）
LiveCodeBench（仅 python）说明：HumanEval 直接运行提交代码；LiveCodeBench 需函数名与输入输出用例比对。

环境准备

Python 环境

仅需 HumanEval：

python3.10 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

需要同时支持 LiveCodeBench：

python3.10 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements/livecodebench.txt

Node.js (JS / TS)

需要 Node.js (推荐版本 >= 20)：

npm install -g ts-node

Docker

docker build -f docker/Dockerfile.python -t code-evaluator-py .
docker build -f docker/Dockerfile.javascript -t code-evaluator-js .
docker build -f docker/Dockerfile.typescript -t code-evaluator-ts .

启动服务

fastapi run app/server.py --port 11451

API

健康检查

GET /health → {"status": true, "msg": "healthy"}

评估接口

POST /evaluations 字段：

uuid
source: "human-eval" | "livecodebench"
lang: python | javascript | typescript
code: 代码字符串
test: LiveCodeBench 专用测试描述（含 fn_name / inputs / outputs）
timeout: float (可选，单位秒，默认值见下文)
memory_limit: int (可选，单位 MB，默认 1024)

示例 HumanEval (Python) - 自定义超时与内存：

{ 
  "uuid":"h1",
  "source":"human-eval",
  "lang":"python",
  "code":"print(1+2)",
  "timeout": 5.0,
  "memory_limit": 512
}

示例 LiveCodeBench：

{
  "uuid":"lc1",
  "source":"livecodebench",
  "lang":"python",
  "code":"def add(a,b): return a+b",
  "test":{"fn_name":"add","inputs":["[1,2]","[3,4]"],"outputs":["3","7"]}
}

返回：

{ "status": true, "msg": "" }

失败时 msg 包含错误原因。

调用示例

curl -X POST http://localhost:11451/evaluations \
  -H 'Content-Type: application/json' \
  -d '{"uuid":"demo","source":"human-eval","lang":"python","code":"print(42)","memory_limit":1024}'

资源限制与默认值

超时 (Timeout)

若请求中未指定 timeout，将使用以下默认值：

python/js: 3s
typescript: 5s
livecodebench: 6s + 2s * 用例数

内存 (Memory Limit)

若请求中未指定 memory_limit，默认值为 1024 MB。

Python: 使用 resource.setrlimit 限制进程地址空间。 Node.js (JS/TS): 使用 --max-old-space-size 限制 V8 堆内存。

备注

未加强隔离，勿运行不可信高风险代码。

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
app		app
docker		docker
requirements		requirements
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Code Evaluator

项目简介

支持数据集

环境准备

Python 环境

Node.js (JS / TS)

Docker

启动服务

API

健康检查

评估接口

调用示例

资源限制与默认值

超时 (Timeout)

内存 (Memory Limit)

目录

备注

About

Uh oh!

Releases

Packages

Languages

scitix/code-evaluator

Folders and files

Latest commit

History

Repository files navigation

Code Evaluator

项目简介

支持数据集

环境准备

Python 环境

Node.js (JS / TS)

Docker

启动服务

API

健康检查

评估接口

调用示例

资源限制与默认值

超时 (Timeout)

内存 (Memory Limit)

目录

备注

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages