-
Notifications
You must be signed in to change notification settings - Fork 278
添加 OceanBase 集成示例和文档 #871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
添加 OceanBase 集成示例和文档 #871
Conversation
- 在 pyproject.toml 文件的依赖列表中添加了 pymysql - 方便后续数据库相关操作的支持 - 保持依赖一致性和完整性
- Add OceanBaseMetricsLogger class for metrics persistence - Database connection with environment variable support - Table creation with proper indexes - Metric insertion with error handling - Query examples for verification - Add comprehensive quickstart guide - OceanBase introduction and Docker deployment - Connection configuration and troubleshooting - Two integration approaches (direct + custom) - Common SQL queries and performance optimization - Add pymysql dependency to pyproject.toml - Update README with tutorial link Closes: OceanBase integration feature request
Summary of ChangesHello @flying-dragon-ai, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此 PR 引入了与 OceanBase 数据库的新集成,使用户能够高效地持久化和管理训练指标。它提供了全面的文档和实用的 Python 示例,指导用户完成 OceanBase 的设置、连接配置、指标记录以及基本数据查询。这项增强旨在为存储大规模训练数据和实验记录提供一个健壮的解决方案。 Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
这次的 PR 添加了将训练指标持久化到 OceanBase 的示例和文档,做得非常棒。文档内容详实,覆盖了从安装、配置到高级用法和故障排查等多个方面。示例代码也清晰地展示了如何与 Oceanbase 数据库进行交互。我提出了一些建议,主要是关于在示例代码 oceanbase_example.py 中使用上下文管理器(with 语句)来简化资源管理,使代码更健壮和易读。总体来说,这是一次高质量的贡献,将对需要持久化训练指标的用户非常有帮助。
| metrics_logger = OceanBaseMetricsLogger(**config) | ||
|
|
||
| try: | ||
| # 1. 连接数据库 | ||
| metrics_logger.connect() | ||
|
|
||
| # 2. 创建表 | ||
| metrics_logger.create_table() | ||
|
|
||
| # 3. 插入示例数据 | ||
| logger.info("插入示例训练指标...") | ||
| for step in range(1, 6): | ||
| metrics_logger.insert_metric( | ||
| experiment_name="gsm8k_grpo_demo", | ||
| step=step * 100, | ||
| loss=1.5 - step * 0.2, | ||
| reward=0.5 + step * 0.1, | ||
| ) | ||
|
|
||
| logger.info("✓ 示例数据插入成功") | ||
|
|
||
| # 4. 查询验证 | ||
| logger.info("查询最近 5 条记录...") | ||
| with metrics_logger.connection.cursor() as cursor: | ||
| cursor.execute( | ||
| """ | ||
| SELECT experiment_name, step, loss, reward, timestamp | ||
| FROM training_metrics | ||
| ORDER BY timestamp DESC | ||
| LIMIT 5 | ||
| """ | ||
| ) | ||
| results = cursor.fetchall() | ||
| for row in results: | ||
| logger.info( | ||
| f" {row['experiment_name']} | " | ||
| f"step={row['step']} | " | ||
| f"loss={row['loss']:.3f} | " | ||
| f"reward={row['reward']:.3f} | " | ||
| f"time={row['timestamp']}" | ||
| ) | ||
|
|
||
| logger.info("=== 示例执行完成 ===") | ||
|
|
||
| except Exception as e: | ||
| logger.error(f"执行失败: {e}") | ||
| raise | ||
| finally: | ||
| # 5. 关闭连接 | ||
| metrics_logger.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为了使资源管理更安全、代码更简洁,建议将 OceanBaseMetricsLogger 修改为支持上下文管理器协议(with 语句)。这样可以确保数据库连接在使用后无论是否发生异常都会被正确关闭。
首先,请在 OceanBaseMetricsLogger 类中添加 __enter__ 和 __exit__ 方法:
def __enter__(self):
self.connect()
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.close()然后,main 函数中的 try...finally 块就可以用更简洁的 with 语句来重构。
try:
with OceanBaseMetricsLogger(**config) as metrics_logger:
# 1. 连接数据库 (由 with 语句自动处理)
# 2. 创建表
metrics_logger.create_table()
# 3. 插入示例数据
logger.info("插入示例训练指标...")
for step in range(1, 6):
metrics_logger.insert_metric(
experiment_name="gsm8k_grpo_demo",
step=step * 100,
loss=1.5 - step * 0.2,
reward=0.5 + step * 0.1,
)
logger.info("✓ 示例数据插入成功")
# 4. 查询验证
logger.info("查询最近 5 条记录...")
with metrics_logger.connection.cursor() as cursor:
cursor.execute(
"""
SELECT experiment_name, step, loss, reward, timestamp
FROM training_metrics
ORDER BY timestamp DESC
LIMIT 5
"""
)
results = cursor.fetchall()
for row in results:
logger.info(
f" {row['experiment_name']} | "
f"step={row['step']} | "
f"loss={row['loss']:.3f} | "
f"reward={row['reward']:.3f} | "
f"time={row['timestamp']}"
)
logger.info("=== 示例执行完成 ===")
except Exception as e:
logger.error(f"执行失败: {e}")
raise|
Looks like the check is failing, please resolve the checking issue. |
Replace `Optional[X]` with `X | None` syntax (Python 3.10+) in oceanbase_example.py to comply with ruff UP045 rule. Changes: - Remove unused `typing.Optional` import - Update connection type annotation - Update insert_metric parameter annotations
- 引入了 typing.Optional 以替代联合类型注解 - 将 pymysql.Connection | None 修改为 Optional[pymysql.Connection] - 将 float | None 类型参数改为 Optional[float] - 提升代码的类型一致性和可读性
|
I don't think we need many CLAUDE.md and Chinese annotations in code in this P.R... |
rchardx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not include your Claude code settings in this OceanBase example PR.
Please use English as the primary language for this repository, in both title and contents.
| "pebble", | ||
| "timeout-decorator", | ||
| "prettytable", | ||
| "pymysql", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pymysql is added as a core dependency but is only used by examples/utils/oceanbase_example.py. This forces all users to install pymysql even if they never use OceanBase. The examples/ directory is explicitly excluded from package distribution.
I believe OceanBase users will install this package in their own environment. Please do not add this package in AReaL.
Description
Related Issue
Fixes #(issue)
Type of Change
work as expected)
Checklist
jb build docs/gemini review)Breaking Change Details (if applicable):
Additional Context
Need help? Check the Contributing Guide or ask in
GitHub Discussions!