Deploy/dev by FortiShield · Pull Request #96 · khulnasoft/GPT-DB

FortiShield · 2026-01-23T01:57:09Z

User description

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Snapshots:

Include snapshots for easier review.

Checklist:

My code follows the style guidelines of this project
I have already rebased the commits and make the commit message conform to the project standard.
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
Any dependent changes have been merged and published in downstream modules

PR Type

Enhancement, Bug fix, Formatting

Description

Agent-based benchmarking support: Added comprehensive agent API evaluation framework with BenchmarkAgentTask, response parsing strategies, and HTTP method configuration
Neo4j graph database integration: Implemented Neo4j connector, store adapter, and community store adapter for graph operations and schema introspection
Benchmark data loading refactoring: Migrated from CSV to SQLite merge functionality with GitHub Falcon repository support and JSON/JSONL parsing
File parsing service enhancements: Added Excel and GitHub Falcon file parsers with standardized result models and Excel output support
Naming consistency updates: Renamed DbGpts* classes to GptDbs* and updated repository references from eosphoros to khulnasoft organization
Full text search improvements: Enhanced Elasticsearch with metadata filtering and added full text search interface to base storage class
Environment variable resolution: Added support for resolving database passwords from environment variables in connector manager
API enhancements: Added app_code parameter to chat completion and HTTP request bodies for better app identification
Code quality improvements: Comprehensive import statement reorganization across 50+ files to follow PEP 8 style guidelines, improved error logging with traceback information, and null/boolean value normalization in data comparison
Documentation updates: Updated documentation URL references and package names in examples

Diagram Walkthrough

flowchart LR
  A["Benchmark Service"] -->|"Agent Config"| B["BenchmarkAgentTask"]
  A -->|"File Type"| C["File Parsers"]
  C -->|"Excel"| D["ExcelFileParseService"]
  C -->|"GitHub"| E["FalconFileParseService"]
  F["Neo4j Connector"] -->|"Graph Ops"| G["Neo4jStore"]
  G -->|"Community"| H["Neo4jStoreAdapter"]
  I["Elasticsearch"] -->|"Full Text"| J["Metadata Filtering"]
  K["Connector Manager"] -->|"Env Vars"| L["Password Resolution"]
  M["API Endpoints"] -->|"app_code"| N["Chat Completion"]

File Walkthrough

Relevant files

Enhancement

22 files

benchmark_data_manager.py `Refactor benchmark data loading from CSV to SQLite merge` packages/gptdb-serve/src/gptdb_serve/evaluate/service/fetchdata/benchmark_data_manager.py Refactored CSV import to SQLite merge functionality, replacing CSV parsing with SQLite database merging Added new Pydantic models (`FileLoadResult`, `SqlFileItem`, `GoldenSqlListResult`) for unified file loading results Implemented `load_file_from_github()` method supporting JSON/JSONL file parsing with multiple encoding support Added `_parse_json_file()` and `_parse_jsonl_file()` helper methods for flexible JSON data parsing Enhanced GitHub repository download with branch URL support and improved error handling Replaced `async_after_start()` with `load_data()` method and removed table mapping functionality	+520/-328
neo4j_store_adapter.py `Add Neo4j Community Store Adapter implementation` packages/gptdb-ext/src/gptdb_ext/storage/knowledge_graph/community/neo4j_store_adapter.py New file implementing Neo4j Community Store Adapter for graph database operations Provides methods for community discovery, graph traversal, and CRUD operations on entities/relationships Includes Cypher query execution, schema introspection, and graph exploration with depth/direction control Implements conversion utilities between Neo4j nodes/relationships and internal Vertex/Edge representations	+779/-0
falcon_file_parse.py `Add Falcon benchmark file parser service` packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/ext/falcon_file_parse.py New file implementing Falcon benchmark file parser for dev/test environments Provides dataclass models (`BenchmarkDataItem`, `ColumnItem`, `TableDDLItem`, `TableDataItem`) for benchmark data structures Implements concurrent async file loading with fallback to sync execution in existing event loops Includes prompt template building and standard benchmark set parsing with SQL schema formatting	+648/-0
controller.py `Refactor agent chat routing and memory class names` packages/gptdb-serve/src/gptdb_serve/agent/agents/controller.py Renamed imports from `db_gpts_memory` to `gpt_dbs_memory` with updated class names Commented out AWEL_LAYOUT flow chat code path and refactored `agent_chat_v2()` method Added `is_flow_chat()` method to check if app uses flow-based chat Extracted flow chat logic into separate `app_agent_flow_chat()` method Updated `app_agent_chat()` to conditionally route between flow and agent-based chat	+271/-179
file_parse_service.py `Refactor file parse service with Excel output` packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/file_parse_service.py Made `parse_input_sets()` abstract method require `evaluation_env` parameter Removed `@abstractmethod` decorator from `write_data_compare_result()` and `write_multi_round_benchmark_result()` Implemented `write_data_compare_result()` to write benchmark results to Excel with auto-sizing columns Updated `parse_standard_benchmark_sets()` signature to include `evaluation_env` parameter Removed dependency on `BenchmarkResultDao` and `ExcelUtils`	+100/-304
conn_neo4j.py `Add Neo4j datasource connector implementation` packages/gptdb-ext/src/gptdb_ext/datasource/conn_neo4j.py New file implementing Neo4j connector with Bolt protocol support Provides `Neo4jParameters` dataclass for connection configuration (host, port, user, password, database) Implements `Neo4jConnector` class with methods for graph operations, schema introspection, and Cypher query execution Includes database management, index retrieval, and streaming query support	+344/-0
connector_manager.py `Add Neo4j connector and environment variable resolution` packages/gptdb-serve/src/gptdb_serve/datasource/manages/connector_manager.py Added import for `Neo4jConnector` in `on_init()` method Added environment variable resolution for database passwords using `_resolve_env_vars()` Improved error handling for JSON parsing with fallback to empty dict Enhanced `test_connection()` to resolve environment variables in password field	+14/-3
benchmark_service.py `Add agent-based benchmarking support with GitHub Falcon integration` packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/benchmark_service.py Added support for agent-based benchmarking alongside LLM benchmarking with new imports for `BenchmarkAgentTask` and `StorageUtil` Introduced new parameters for agent API configuration (`api_url`, `http_method`, `headers`, `parse_strategy`, `response_mapping`, `evaluation_env`, `benchmark_type`) Changed `STANDARD_BENCHMARK_FILE_PATH` from local Excel file to GitHub Falcon repository URL Added methods to parse HTTP methods, response strategies, and evaluation environments with fallback defaults Implemented conditional logic to execute either agent or LLM tasks based on `benchmark_type` configuration Added prompt template formatting with database dialect support	+198/-36
benchmark_agent_task.py `New benchmark agent task module for remote API evaluation` packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/task/benchmark_agent_task.py New file implementing `BenchmarkAgentTask` class for evaluating remote agent APIs Includes `ResponseParser` class with JSON path and direct parsing strategies Supports configurable HTTP methods, headers, authentication, and response extraction Implements retry logic with configurable delays and timeout handling Provides async request execution with SSL verification options	+355/-0
elasticsearch.py `Enhance Elasticsearch full text search with metadata filtering` packages/gptdb-ext/src/gptdb_ext/storage/full_text/elasticsearch.py Changed metadata field type from `keyword` to `object` with `dynamic: True` for better queryability Added `is_support_full_text_search()` and `full_text_search()` methods to support full text search Implemented `_build_query()` and `_build_metadata_filter()` methods for metadata filtering with Elasticsearch bool queries Added `_normalize_metadata()` utility to ensure consistent metadata handling across dict, string, and JSON formats Updated `load_document()` and search methods to use normalized metadata instead of JSON serialization	+111/-11
models.py `Add agent configuration models and evaluation environment enums` packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/models.py Added new enums: `EvaluationEnv` (DEV, TEST), `BenchmarkInvokeType` (LLM, AGENT), `HttpMethod`, `ResponseParseStrategy` Added `GITHUB` file parse type to `FileParseTypeEnum` Introduced `AgentApiConfig` dataclass with API endpoint, HTTP method, headers, response parsing configuration, and retry settings Added `AgentCompletionRequest` dataclass for agent API requests Extended `BenchmarkExecuteConfig` with `evaluation_env`, `invoke_type`, and `agent_config` fields	+127/-0
neo4j_store.py `Implement Neo4j graph store with resource registration` packages/gptdb-ext/src/gptdb_ext/storage/graph_store/neo4j_store.py Added `@register_resource` decorator with configuration parameters for Neo4j host, port, username, password, and database Implemented `Neo4jStoreConfig` dataclass with connection details and graph element type configurations Added `Neo4jStore` class implementation with connection initialization using `Neo4jConnector` Implemented `get_config()` and `is_exist()` methods for configuration retrieval and database existence checking	+172/-2
excel_file_parse.py `New Excel file parsing service for benchmark data extraction` packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/ext/excel_file_parse.py New file implementing `ExcelFileParseService` for parsing Excel benchmark files Includes `parse_input_sets()` method to extract benchmark data from Excel worksheets Implements `parse_standard_benchmark_sets()` for parsing standard answer sets with comparison strategies Added `_parse_multi_standard_result()` utility for handling multiple standard results from JSON strings	+180/-0
user_input_execute_service.py `Refactor file service to support multiple file type parsers` packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/user_input_execute_service.py Refactored constructor to accept `file_type` parameter instead of `file_service` instance Added `file_service()` factory method to instantiate appropriate parser based on file type (Excel or GitHub Falcon) Updated `read_input_file()` to accept `evaluation_env` parameter and pass it to parser Updated `post_dispatch()` to pass `evaluation_env` to standard benchmark parsing Moved `BenchmarkResultDao` initialization to class level	+26/-15
base.py `Add full text search interface to base class` packages/gptdb-core/src/gptdb/storage/full_text/base.py Added `is_support_full_text_search()` method returning `True` for full text stores Added `full_text_search()` method as default implementation delegating to `similar_search_with_scores()` Updated `aload_document()` signature to include optional `file_id` parameter for document-level tracking	+34/-1
schemas.py `Extend benchmark request schema with agent configuration fields` packages/gptdb-serve/src/gptdb_serve/evaluate/api/schemas.py Changed `datasets` field type from `List[dict]` to `Union[str, List[dict]]` to support string values Added new fields to `BenchmarkServeRequest`: `evaluation_env`, `benchmark_type`, `api_url`, `http_method`, `headers`, `parse_strategy`, `response_mapping`	+17/-2
endpoints.py `Update benchmark endpoint to support agent configuration` packages/gptdb-serve/src/gptdb_serve/evaluate/api/endpoints.py Updated `_run_benchmark_task_sync()` to pass all parameters as keyword arguments Added new parameters to `run_dataset_benchmark()` call: `benchmark_type`, `api_url`, `http_method`, `headers`, `parse_strategy`, `response_mapping`, `evaluation_env`	+15/-9
factory.py `Add Neo4j store adapter factory support` packages/gptdb-ext/src/gptdb_ext/storage/knowledge_graph/community/factory.py Added import for `Neo4jStore` and `Neo4jStoreAdapter` Implemented factory method to create `Neo4jStoreAdapter` for `Neo4jStore` instances	+7/-0
api_v1.py `Add app_code parameter to chat completion requests` packages/gptdb-app/src/gptdb_app/openapi/api_v1/api_v1.py Added `app_code` parameter to dialogue context in chat completion endpoint Passed `app_code` to both `dialogue.select_param` and streaming response calls	+2/-1
http_trigger.py `Add app_code field to HTTP request body` packages/gptdb-core/src/gptdb/core/awel/trigger/http_trigger.py Reorganized imports to place third-party imports before local imports Added `app_code` field to `CommonLLMHttpRequestBody` for app identification	+2/-2
StorageUtil.py `Add GitHub Falcon file type detection` packages/gptdb-core/src/gptdb/util/benchmarks/StorageUtil.py Added `GITHUB_FALCON_PREFIX` constant for GitHub Falcon repository URL Added logic to detect and return `FileParseTypeEnum.GITHUB` for Falcon repository URLs	+4/-0
opensearch.py `Add file_id parameter to async document loading` packages/gptdb-ext/src/gptdb_ext/storage/full_text/opensearch.py Added optional `file_id` parameter to `aload_document` method signature Updated method to accept file identifier for async document loading	+3/-1

Formatting

52 files

gptdbs.py `Rename DbGpts classes to GptDbs for consistency` packages/gptdb-serve/src/gptdb_serve/agent/agents/gptdbs.py Renamed `DbGptsMessage` to `GptDbsMessage` for naming consistency Renamed `DbGptsTaskStep` to `GptDbsTaskStep` Renamed `DbGptsCompletion` to `GptDbsCompletion` Updated all references and factory methods to use new naming convention	+13/-13
pack.py `Rename tool identifier constant for consistency` packages/gptdb-core/src/gptdb/agent/resource/tool/pack.py Renamed `DB_GPT_TOOL_IDENTIFIER` to `GPT_DB_TOOL_IDENTIFIER` for consistency Updated all references to use new identifier name in `_is_function_tool()` function	+4/-5
cli_scripts.py `Reorganize gptdbs CLI imports for clarity` packages/gptdb-core/src/gptdb/cli/cli_scripts.py Reorganized imports for gptdbs CLI commands with better grouping Separated import statements for `add_repo`, `install`, `list_all_apps`, and other commands	+7/-3
serve_initialization.py `Update naming and simplify imports in serve initialization` packages/gptdb-app/src/gptdb_app/initialization/serve_initialization.py Updated comment from `DbGpts Register` to `GptDbs Register` for consistency Simplified import statement for `PromptServe`	+3/-5
simple_nl_schema_sql_chart_example.py `Fix import order and update package name in example` examples/awel/simple_nl_schema_sql_chart_example.py Reorganized imports to place third-party imports after local imports Updated pip install command from `gpt-db[openai]` to `db-gpt[openai]`	+2/-3
app_agent_manage.py `Update memory class names for consistency` packages/gptdb-serve/src/gptdb_serve/agent/agents/app_agent_manage.py Renamed import from `db_gpts_memory` to `gpt_dbs_memory` Updated class references from `MetaDbGptsPlansMemory` to `MetaGptDbsPlansMemory` Updated class references from `MetaDbGptsMessageMemory` to `MetaGptDbsMessageMemory`	+3/-3
react_agent_example.py `Fix import order in React agent example` examples/agents/react_agent_example.py Reorganized imports to place `typing_extensions` after local imports	+1/-2
simple_intent_translator.py `Normalize entity naming in intent translator example` packages/gptdb-core/src/gptdb/rag/transformer/simple_intent_translator.py Changed entity name from `GPTDB` to `gptdb` in example prompt for consistency	+1/-1
gpts_app.py `Fix import order in gpts_app module` packages/gptdb-serve/src/gptdb_serve/agent/db/gpts_app.py Reorganized imports to place sqlalchemy imports after local imports	+10/-11
vllm_llm.py `Fix import order and update environment variable naming` packages/gptdb-core/src/gptdb/model/llm/llm_out/vllm_llm.py Reorganized imports to place gptdb imports before third-party imports Changed environment variable from `DB_GPT_MODEL_BENCHMARK` to `GPT_DB_MODEL_BENCHMARK`	+2/-3
sandbox_code_agent_example.py `Fix import order in sandbox code agent example` examples/agents/sandbox_code_agent_example.py Reorganized imports to place gptdb imports before third-party imports	+1/-2
api_v2.py `Fix import order in API v2 module` packages/gptdb-app/src/gptdb_app/openapi/api_v2.py Reorganized imports to place starlette imports after local imports	+1/-2
oceanbase_store.py `Fix import order in OceanBase store module` packages/gptdb-ext/src/gptdb_ext/storage/vector_store/oceanbase_store.py Reorganized imports to place sqlalchemy imports after local imports	+2/-3
serve.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/flow/serve.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` import after internal package imports	+1/-2
i18n_utils.py `Import statement reorganization for code style` packages/gptdb-core/src/gptdb/util/i18n_utils.py Reorganized import statements to follow PEP 8 style guidelines Moved `gptdb.configs.model_config` import before try-except block	+2/-1
recommend_question.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/agent/app/recommend_question/recommend_question.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` imports after internal package imports	+1/-2
storage_adapter.py `Import statement reorganization for code style` packages/gptdb-core/src/gptdb/storage/chat_history/storage_adapter.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy.orm.Session` import after internal package imports	+1/-2
media_utils.py `Import statement reorganization for code style` packages/gptdb-core/src/gptdb/model/utils/media_utils.py Reorganized import statements to follow PEP 8 style guidelines Moved third-party imports after internal package imports	+2/-3
serve.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/file/serve.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` import after internal package imports	+1/-2
conv_links.py `Import statement reorganization for code style` packages/gptdb-app/src/gptdb_app/openapi/api_v1/links/conv_links.py Reorganized import statements to follow PEP 8 style guidelines Moved `gptdb.storage.metadata` import before third-party imports	+1/-2
__init__.py `Import statement reorganization for code style` packages/gptdb-core/src/gptdb/model/init.py Reorganized import statements to follow PEP 8 style guidelines Moved `AutoLLMClient` import before try-except block	+2/-1
tugraph_store_adapter.py `Import statement reorganization for code style` packages/gptdb-ext/src/gptdb_ext/storage/knowledge_graph/community/tugraph_store_adapter.py Reorganized import statements to follow PEP 8 style guidelines Moved `packaging.version.Version` import after internal package imports	+1/-2
my_plugin_db.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/agent/hub/db/my_plugin_db.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` imports after internal package imports	+1/-2
custom_tool_agent_example.py `Import statement reorganization for code style` examples/agents/custom_tool_agent_example.py Reorganized import statements to follow PEP 8 style guidelines Moved `typing_extensions` imports after internal package imports	+1/-2
models.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/file/models/models.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` imports after internal package imports	+1/-2
models.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/gptdbs/my/models/models.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` imports after internal package imports	+1/-2
models.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/model/models/models.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` imports after internal package imports	+1/-2
serve.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/utils/_template_files/default_serve_template/serve.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` import after internal package imports	+1/-2
models.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/feedback/models/models.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` imports after internal package imports	+1/-2
base.py `Import statement reorganization for code style` packages/gptdb-core/src/gptdb/storage/knowledge_graph/base.py Reorganized import statements to follow PEP 8 style guidelines Moved `pydantic.Field` import after internal package imports	+1/-2
agent_memory.py `Import statement reorganization for code style` packages/gptdb-core/src/gptdb/agent/core/memory/agent_memory.py Reorganized import statements to follow PEP 8 style guidelines Moved `typing_extensions.TypedDict` import after internal package imports	+1/-2
models.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/prompt/models/models.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` imports after internal package imports	+1/-2
base.py `Import statement reorganization for code style` packages/gptdb-core/src/gptdb/util/tracer/base.py Reorganized import statements to follow PEP 8 style guidelines Moved `starlette.requests.Request` import after internal package imports	+1/-2
disk_storage.py `Import statement reorganization for code style` packages/gptdb-core/src/gptdb/storage/cache/storage/disk/disk_storage.py Reorganized import statements to follow PEP 8 style guidelines Moved `rocksdict` imports after internal package imports	+1/-2
models.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/utils/_template_files/default_serve_template/models/models.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` imports after internal package imports	+1/-2
settings.py `Import statement reorganization for code style` packages/gptdb-app/src/gptdb_app/openapi/api_v1/links/settings.py Reorganized import statements to follow PEP 8 style guidelines Moved `gptdb.storage.metadata` import before third-party imports	+1/-2
gpts_plans_db.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/agent/db/gpts_plans_db.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` imports after internal package imports	+1/-2
benchmark_db.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/evaluate/db/benchmark_db.py Reorganized import statements to follow PEP 8 style guidelines Moved `gptdb.storage.metadata` import before third-party imports	+1/-2
serve.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/gptdbs/hub/serve.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` import after internal package imports	+1/-2
ext_http_trigger.py `Import statement reorganization for code style` packages/gptdb-core/src/gptdb/core/awel/trigger/ext_http_trigger.py Reorganized import statements to follow PEP 8 style guidelines Moved `starlette.requests.Request` import after internal package imports	+1/-2
serve.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/gptdbs/my/serve.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` import after internal package imports	+1/-2
models.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/agent/chat/models/models.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` imports after internal package imports	+1/-2
serve.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/agent/chat/serve.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` import after internal package imports	+1/-2
serve.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/datasource/serve.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` import after internal package imports	+1/-2
serve.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/evaluate/serve.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` import after internal package imports	+1/-2
serve.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/feedback/serve.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` import after internal package imports	+1/-2
models.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/libro/models/models.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` imports after internal package imports	+1/-2
serve.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/libro/serve.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` import after internal package imports	+1/-2
serve.py `Import statement reorganization for code style` packages/gptdb-serve/src/gptdb_serve/rag/serve.py Reorganized import statements to follow PEP 8 style guidelines Moved `sqlalchemy` import after internal package imports	+1/-2
hf_chat_llm.py `Import statement reorganization for code style` packages/gptdb-core/src/gptdb/model/llm/llm_out/hf_chat_llm.py Reorganized import statements to follow PEP 8 style guidelines Moved `transformers` imports after internal package imports	+1/-2
chat_history_db.py `Import statement reorganization for code style` packages/gptdb-core/src/gptdb/storage/chat_history/chat_history_db.py Reorganized import statements to follow PEP 8 style guidelines Moved `gptdb.storage.metadata` import before third-party imports	+1/-2
server.py `Import statement reorganization for code style` packages/gptdb-core/src/gptdb/util/code/server.py Reorganized import statements to follow PEP 8 style guidelines Moved `lyric` imports after internal package imports	+1/-2

Configuration changes

2 files

repo.py `Update default gptdbs repository to khulnasoft organization` packages/gptdb-core/src/gptdb/util/gptdbs/repo.py Changed default repository from `eosphoros/gptdbs` to `khulnasoft/gptdbs` Updated repository URL from `eosphoros` to `khulnasoft-bot` in GitHub links Updated error messages and documentation examples to reflect new naming	+3/-3
model.py `Update plugin hub URL and fix import order` packages/gptdb-serve/src/gptdb_serve/agent/model.py Reorganized imports to place pydantic imports after local imports Updated plugin hub URL from `khulnasoft/GPT-DB-Plugins.git` to `khulnasoft-bot/GPT-DB-Plugins.git`	+2/-3

Miscellaneous

2 files

demo_selenium.py `Add Selenium WebDriver demo script` demo_selenium.py New file demonstrating Selenium WebDriver usage with Firefox browser Includes headless mode configuration, webpage navigation, and screenshot capture Contains Bengali language comments for documentation	+25/-0
model_adapter.py `Rename environment variable for consistency` packages/gptdb-core/src/gptdb/model/adapter/model_adapter.py Changed environment variable name from `DB_GPT_MODEL_BENCHMARK` to `GPT_DB_MODEL_BENCHMARK`	+1/-1

Error handling

1 files

service.py `Improve error logging with traceback information` packages/gptdb-serve/src/gptdb_serve/rag/service/service.py Added traceback import and error logging in exception handler Enhanced error logging to include full traceback information for debugging	+4/-1

Bug fix

1 files

data_compare_service.py `Add null and boolean value normalization in data comparison` packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/data_compare_service.py Added null string handling to convert `null` to empty string Added boolean value normalization to convert `True`/`False` to `1`/`0` for database consistency	+18/-1

Documentation

1 files

simple_rag_summary_example.py `Update documentation URL reference` examples/awel/simple_rag_summary_example.py Updated documentation URL from `docs.gptdb.cn` to `gpt.khulnasoft.com`	+1/-1

Additional files

101 files

bug-report.yml	+4/-4
python-publish.yml	+3/-3
.pre-commit-config.yaml	+41/-41
Dockerfile	+34/-0
LICENSE	+1/-1
Makefile	+106/-113
README.md	+239/-303
build_image.sh	+8/-8
ha-model-cluster.toml.	+64/-0
ha-webserver.toml.	+28/-0
test_case_sqlite_data.py	+1/-1
my.cnf.	+44/-0
case_1_student_manager_mysql.sql.	+63/-0
case_1_student_manager_sqlite.sql.	+59/-0
case_1_student_manager_vertica.sql.	+78/-0
case_2_ecom_mysql.sql.	+63/-0
case_2_ecom_sqlite.sql.	+59/-0
case_2_ecom_vertica.sql.	+78/-0
case_3_order_wide_table_sqlite_wide.sql.	+317/-0
test_case.md.	+87/-0
test_case_info_mysql.sql.	+19/-0
test_case_info_sqlite.sql.	+17/-0
test_case_info_vertica.sql.	+29/-0
user_config_mysql.sql.	+3/-0
2024-07-24-gpt-db-llama-3.1-support.md	+1/-1
2024-09-04-gptdb-v0.6.0-Defining new standards for AI-native data applications.md	+2/-2
2025-03-24-gptdb-v0.7.0-release.md	+10/-10
2025-04-29-gpt-db-qwen3-support.md	+1/-1
benchmark.md	+1/-1
cli.md	+1/-1
text_to_sql.md	+2/-2
Released_V0.5.0.md	+5/-5
Released_V0.6.0.md	+2/-2
data_analysis_app_develop.md	+15/-15
kbqa.md	+1/-1
introduction.md	+3/-3
docker.md	+1/-1
docker_compose.md	+3/-3
aimlapi_llm_install.md	+2/-2
cluster.md	+1/-1
benchmark.md	+2/-2
docusaurus.config.js	+6/-6
auto_search_agent.py	+0/-1
data_manus.py	+0/-1
db_create.py	+0/-1
graph_rag_example.py	+0/-1
chat_data_with_awel.py	+0/-1
graphrag-mini.md	+3/-3
graphrag-test.md	+46/-46
README.md	+2/-1
gptdb_agent.po	+65/-0
gptdb_app__cli.po	+22/-0
gptdb_app_config.po	+266/-0
gptdb_app_gptdb_server.po	+23/-0
gptdb_app_knowledge.po	+30/-0
gptdb_app_operators.po	+556/-0
gptdb_app_scene.po	+71/-0
gptdb_client__cli.po	+73/-0
gptdb_core.po	+1043/-0
gptdb_datasource.po	+65/-0
gptdb_ext_datasource.po	+278/-0
gptdb_ext_rag.po	+478/-0
gptdb_ext_storage.po	+573/-0
gptdb_model.po	+1008/-0
gptdb_rag.po	+454/-0
gptdb_serve_agent.po	+52/-0
gptdb_serve_conversation.po	+56/-0
gptdb_serve_core.po	+68/-0
gptdb_serve_datasource.po	+26/-0
gptdb_serve_evaluate.po	+34/-0
gptdb_serve_feedback.po	+26/-0
gptdb_serve_file.po	+64/-0
gptdb_serve_flow.po	+114/-0
gptdb_serve_gptdbs.po	+34/-0
gptdb_serve_libro.po	+26/-0
gptdb_serve_model.po	+34/-0
gptdb_serve_prompt.po	+34/-0
gptdb_serve_rag.po	+155/-0
gptdb_storage.po	+92/-0
gptdb_util.po	+155/-0
gptdb_app_config.mo	[link]
gptdb_datasource.mo	[link]
gptdb_ext_datasource.mo	[link]
gptdb_model.mo	[link]
gptdb_serve_flow.mo	[link]
gptdb_serve_gptdbs.mo	[link]
gptdb_util.mo	[link]
gptdb_app_config.mo	[link]
gptdb_datasource.mo	[link]
gptdb_ext_datasource.mo	[link]
gptdb_model.mo	[link]
gptdb_serve_flow.mo	[link]
gptdb_serve_gptdbs.mo	[link]
gptdb_util.mo	[link]
gptdb_app_config.mo	[link]
gptdb_datasource.mo	[link]
gptdb_ext_datasource.mo	[link]
gptdb_model.mo	[link]
gptdb_serve_flow.mo	[link]
gptdb_serve_gptdbs.mo	[link]
Additional files not shown

Summary by CodeRabbit

Release Notes

Chores
- Renamed environment variables for consistency
- Updated copyright year to 2024
- Enhanced .gitignore to exclude build artifacts and temporary files
- Overhauled build system with modernized Make targets
New Features
- Added Docker support for headless graphical environment with VNC/noVNC access
- Introduced example SQL schemas and test data across multiple database systems
- Added new demo and test automation scripts
Documentation
- Completely restructured README with improved organization and navigation
- Updated documentation links and references throughout
- Added release notes and changelog updates

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: fortishield <161459699+FortiShield@users.noreply.github.com>

sourcery-ai

Sorry, we are unable to review this pull request

The GitHub API does not allow us to fetch diffs exceeding 300 files, and this pull request has 1333

coderabbitai · 2026-01-23T01:57:40Z

Walkthrough

This PR comprehensively updates the project infrastructure, including environment variable renaming across workflows and build scripts, a complete Makefile overhaul with new build targets and package management, addition of a Dockerfile for VNC/graphical environment, license update, extensive documentation rewrites and domain/link updates, and numerous SQL test scripts for multi-database scenarios.

Changes

Cohort / File(s)	Summary
Workflow & Environment Variable Renaming `.github/workflows/python-publish.yml`, `docker/base/build_image.sh`	Environment variable renames from `DB_GPT_VERSION` to `GPT_DB_VERSION` and `DB_GPT_INSTALL_MODE` to `GPT_DB_INSTALL_MODE`. Affects version export and build mode selection in CI/CD pipeline.
Build Infrastructure Overhaul `Makefile`	Major rewrite introducing explicit Python version (3.11), VENV_BIN OS-aware paths, package list (PACKAGES), and extensive new targets: `setup`, `testenv`, `test`, `test-%`, `fmt`, `fmt-check`, `build`, `build-%`, `docker-%`, `publish`, `publish-%`, `clean`, `clean-dist`, `help`. Shifts from inline venv management to `uv`-based dependency and package management.
Container & Environment Configuration `Dockerfile`, `docker/compose_examples/conf/ha-model-cluster.toml`, `docker/compose_examples/conf/ha-webserver.toml`, `docker/examples/my.cnf`	New Dockerfile (python:3.11-slim with VNC/noVNC headless graphical setup). Two new TOML configuration files with environment-driven settings for multi-service model cluster and web server. New MySQL configuration template with UTF-8 defaults and tuning comments.
Test Data & SQL Scripts `docker/examples/sqls/case__.sql`, `docker/examples/sqls/test_case.sql`, `docker/examples/sqls/user_config_.sql`	Multiple new SQL scripts (11 files) for MySQL, SQLite, and Vertica establishing schema and seed data: student management (case_1), e-commerce (case_2), wide order table (case_3), and test case metadata tables. All database-agnostic DDL/DML additions.
Project Metadata & Configuration `LICENSE`, `.gitignore`, `.pre-commit-config.yaml`, `demo_selenium.py`	License copyright update (2023 → 2024, magic.chen → KhulnaSoft Ltd). Extended .gitignore with frontend/build artifacts and lock files. Re-instated pre-commit config header comment. New demo_selenium.py script for Firefox automation with Selenium (headless=False, navigates to python.org, captures screenshot).
Documentation: README & Blog `README.md`, `docs/blog/2024-09-04-.md`, `docs/blog/2025-03-24-.md`	README restructured from marketing-focused to developer-oriented with modular sections (What is GPT-DB, Core Capabilities, Deployment Options, etc.). Blog post link/domain updates: acknowledgment handle replacements (`@khulnasoft` → `@csunny`) and yuque/docs URL domain changes (eosphoros → khulnasoft, docs.gptdb.cn → gpt.khulnasoft.com).
Documentation: Technical Guides & Changelog `docs/docs/application//`, `docs/docs/cookbook//`, `docs/docs/changelog/*/`, `docs/docs/gptdbs/`, `docs/docs/installation/`, `docs/docs/modules/`, `docs/docusaurus.config.js`	Widespread documentation updates: repository URL references (eosphoros/khulnasoft to khulnasoft-bot repos, khulnasoft/gptdbs to khulnasoft-bot/gptdbs), domain migrations (docs.gptdb.cn → gpt.khulnasoft.com, eosphoros → khulnasoft in Yuque links), UTM parameter updates (db_gpt → gpt_db), and Docusaurus config resource URL updates. No functional logic changes; link and contributor acknowledgments only.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

init commit #90: Modifies environment variable naming in workflow and build scripts (DB_GPT_VERSION → GPT_DB_VERSION), directly related to versioning infrastructure changes.

Suggested labels

review-effort-4, infrastructure, documentation, configuration

Poem

🐰 A grand refactoring hops our way,
With Makefile targets holding sway,
From eosphoros to Khulnasoft's light,
And VNC containers spinning bright!
SQL schemas dance in harmony,
Building the future, wild and free! 🚀

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is largely incomplete; required template sections (How Has This Been Tested, Snapshots, Checklist items) remain unchecked or unfilled, and no test details or validation evidence is provided.	Complete the testing section with specific test steps and results, add snapshots if applicable, and check off completed checklist items to demonstrate thorough validation.
Title check	❓ Inconclusive	The title 'Deploy/dev' is vague and generic, providing no meaningful information about the changeset's primary purpose or scope.	Use a descriptive title that summarizes the main change, such as 'Add agent benchmarking, Neo4j integration, and refactor file parsing' or a more specific focus area.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

qodo-code-review · 2026-01-23T02:00:19Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🔴	Path traversal read Description: `load_file_from_github()` joins user-controlled `file_name` to `repo_dir` without preventing `..` path segments (e.g., `../../etc/passwd`), enabling path traversal that can read files outside the extracted repository directory if this API is exposed to untrusted input. benchmark_data_manager.py [391-476] Referred Code async def load_file_from_github( self, file_name: Optional[str] = None ) -> Optional[FileLoadResult]: """Download and read a specified file from a GitHub repository. Supported file types: .json / .jsonl `file_name` can be a relative path within the repository or a plain filename Unified return structure (FileLoadResult): - type: "json" \| "jsonl" - file_path, file_name, encoding - rows: List[{line_no:int, data:Any}] where data is parsed JSON object - count: total number of rows - failed_count: number of failed lines (non-zero for jsonl or malformed json) - failures: details for failed lines For JSON files: - If the file contains a JSON array, each element becomes a Row - If the file contains a single JSON object, it becomes one Row - The structure is flexible and doesn't depend on specific keys """ ... (clipped 65 lines)
⚪	SSRF via download Description: `_download_repo_contents()` allows downloading arbitrary `.zip` URLs when `repo_url` ends with `.zip`, which can be used for SSRF/internal network access and to fetch attacker-controlled archives for further exploitation. benchmark_data_manager.py [760-814] Referred Code async def _download_repo_contents(self, repo_url: str) -> str: """Download repository with caching, supporting branch URLs""" cache_path = self._get_cache_path(repo_url) # Use cache if valid ... (clipped 34 lines)
	Untrusted SQL execution Description: `_merge_sqlite_databases()` executes `create_sql` pulled from untrusted downloaded SQLite files (`sqlite_master.sql`) via `cursor.execute(create_sql)`, which may allow execution of unexpected/malicious SQL constructs embedded in the source databases. benchmark_data_manager.py [868-955] Referred Code async def _merge_sqlite_databases(self, sqlite_files: List[str]) -> Dict: """Merge multiple SQLite files into the main database""" await self.init_connector() assert self._connector is not None def _worker(): results = { "total_files": len(sqlite_files), "successful": 0, "failed": 0, "tables_merged": [], } with self._connector.session_scope() as session: # 获取底层的 sqlite3 连接对象 connection_proxy = session.connection() # 兼容不同版本的 SQLAlchemy 获取底层连接的方式 try: # SQLAlchemy 1.4+ / 2.0 raw_conn = connection_proxy.connection.dbapi_connection except AttributeError: ... (clipped 67 lines)
	Cypher injection Description: Cypher queries interpolate labels/relationship types using f-strings (e.g., `MERGE` `(src)-[r:{edge_type}]->(dst)`), which can enable Cypher injection if `edge_type`/types are ever influenced by untrusted input rather than fixed enums. neo4j_store_adapter.py [183-202] Referred Code def upsert_edge( self, edges: Iterator[Edge], edge_type: str, src_type: str, dst_type: str ) -> None: """Upsert edges.""" for edge in edges: props = { "id": edge.name or f"{edge.sid}_{edge.tid}", "name": edge.name or edge_type, "description": edge.get_prop("description") or "", "_chunk_id": edge.get_prop("_chunk_id") or "", } query = f""" MATCH (src:{src_type} {{id: $sid}}) MATCH (dst:{dst_type} {{id: $tid}}) MERGE (src)-[r:{edge_type}]->(dst) SET r += $props """ self.graph_store.conn.run(query, sid=edge.sid, tid=edge.tid, props=props)
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
🔴	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: Missing audit context: Critical destructive operations (e.g., truncation/deletion) are logged without user identity/context and without structured outcome details, preventing reliable reconstruction of actions. Referred Code def truncate(self): """Truncate Graph.""" logger.warning("Truncating all data from Neo4j database") # Delete all relationships first self.graph_store.conn.run("MATCH ()-[r]->() DELETE r") # Then delete all nodes self.graph_store.conn.run("MATCH (n) DELETE n") Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: Path traversal risk: `file_name` is joined to `repo_dir` without blocking `..` path segments, enabling path traversal to read files outside the extracted repository directory. Referred Code # Handle both relative path and plain filename cases normalized = str(file_name).strip().lstrip("/").replace("\\", os.sep) candidate_paths: List[str] = [] # Prefer direct path resolution using the relative path direct_path = os.path.join(repo_dir, normalized) if os.path.isfile(direct_path): ext = os.path.splitext(direct_path.lower())[1] if not requested_ext: if ext in allowed_exts: candidate_paths.append(direct_path) elif ext == requested_ext: candidate_paths.append(direct_path) # If not found, recursively search by filename match if not candidate_paths: target_name = os.path.basename(normalized) for root, _, files in os.walk(repo_dir): for f in files: if f == target_name: full = os.path.join(root, f) ... (clipped 9 lines) Learn more about managing compliance generic rules or creating your own custom rules
⚪	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: Generic exception wrapping: Broad exception handling rethrows as `RuntimeError` while also sometimes continuing after failures (e.g., silent rollback/detach failures), which may mask partial-merge states and complicate recovery without clearer error classification and handling strategy. Referred Code async def load_file_from_github( self, file_name: Optional[str] = None ) -> Optional[FileLoadResult]: """Download and read a specified file from a GitHub repository. Supported file types: .json / .jsonl `file_name` can be a relative path within the repository or a plain filename Unified return structure (FileLoadResult): - type: "json" \| "jsonl" - file_path, file_name, encoding - rows: List[{line_no:int, data:Any}] where data is parsed JSON object - count: total number of rows - failed_count: number of failed lines (non-zero for jsonl or malformed json) - failures: details for failed lines For JSON files: - If the file contains a JSON array, each element becomes a Row - If the file contains a single JSON object, it becomes one Row - The structure is flexible and doesn't depend on specific keys """ ... (clipped 541 lines) Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: Upstream error body: The raised error includes raw GitHub API response text (`text_resp`) which could expose internal details to callers if surfaced beyond internal logs. Referred Code headers = {"Accept": "application/vnd.github.v3+json"} async with self._http_session.get(zip_url, headers=headers) as response: if response.status != 200: text_resp = await response.text() raise RuntimeError( f"GitHub API Error {response.status}: {text_resp}" ) Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: Logs full query: Error logs include the full Cypher query string which may embed sensitive identifiers/content depending on how upstream code constructs queries. Referred Code def query(self, query: str, kwargs) -> MemoryGraph: """Execute a Cypher query and return results as a MemoryGraph.""" graph = MemoryGraph() try: results = self.graph_store.conn.run(query, kwargs) for record in results: # Process each value in the record for key, value in record.items(): if value is None: continue if hasattr(value, "labels"): # It's a node vertex = self._neo4j_node_to_vertex(value) graph.upsert_vertex(vertex) elif hasattr(value, "type"): # It's a relationship edge = self._neo4j_relationship_to_edge(value) graph.append_edge(edge) except Exception as e: logger.error(f"Query execution failed: {e}\nQuery: {query}") ... (clipped 1 lines) Learn more about managing compliance generic rules or creating your own custom rules
Update

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-code-review · 2026-01-23T02:03:09Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix missing conversation history saving Fix a bug in `app_agent_chat` where conversation history for flow-based chats is not saved by uncommenting the `finally` block and initializing `agent_task` to prevent an `UnboundLocalError`. packages/gptdb-serve/src/gptdb_serve/agent/agents/controller.py [460-568] async def app_agent_chat( self, conv_uid: str, gpts_name: str, user_query: str, user_code: str = None, sys_code: str = None, enable_verbose: bool = True, stream: Optional[bool] = True, ext_info, ): # logger.info(f"app_agent_chat:{gpts_name},{user_query},{conv_uid}") + conv_serve = ConversationServe.get_instance(CFG.SYSTEM_APP) + current_message: StorageConversation = _build_conversation( + conv_id=conv_uid, + select_param=gpts_name, + summary=user_query, + model_name="", + app_code=gpts_name, + conv_serve=conv_serve, + user_name=user_code, + ) + current_message.save_to_storage() + current_message.start_new_round() + current_message.add_user_message(user_query) + agent_conv_id = None + agent_task = None + default_final_message = None if self.is_flow_chat(gpts_name=gpts_name): try: async for ( task, chunk, agent_conv_id, ) in multi_agents.app_agent_flow_chat( conv_uid, gpts_name, user_query, user_code, sys_code, enable_verbose=enable_verbose, stream=stream, ext_info, ): agent_task = task default_final_message = chunk yield chunk except asyncio.CancelledError: # Client disconnects print("Client disconnected") if agent_task: logger.info(f"Chat to App {gpts_name}:{agent_conv_id} Cancel!") agent_task.cancel() except Exception as e: logger.exception(f"Chat to App {gpts_name} Failed!" + str(e)) raise - # finally: - # logger.info(f"save agent chat info！{conv_uid}") - # if agent_task: - # final_message = await self.stable_message(agent_conv_id) - # if final_message: - # current_message.add_view_message(final_message) - # else: - # default_final_message = default_final_message.replace("data:", "") - # current_message.add_view_message(default_final_message) + finally: + logger.info(f"save agent chat info！{conv_uid}") + if agent_task: + final_message = await self.stable_message(agent_conv_id) + if final_message: + current_message.add_view_message(final_message) + elif default_final_message: + default_final_message = default_final_message.replace("data:", "") + current_message.add_view_message(default_final_message) - # current_message.end_current_round() - # current_message.save_to_storage() + current_message.end_current_round() + current_message.save_to_storage() else: # Temporary compatible scenario messages - conv_serve = ConversationServe.get_instance(CFG.SYSTEM_APP) - current_message: StorageConversation = _build_conversation( - conv_id=conv_uid, - select_param=gpts_name, - summary=user_query, - model_name="", - app_code=gpts_name, - conv_serve=conv_serve, - user_name=user_code, - ) - current_message.save_to_storage() - current_message.start_new_round() - current_message.add_user_message(user_query) - agent_conv_id = None - agent_task = None - default_final_message = None try: async for task, chunk, agent_conv_id in multi_agents.agent_chat_v2( conv_uid, current_message.chat_order, gpts_name, user_query, user_code, sys_code, enable_verbose=enable_verbose, stream=stream, **ext_info, ): agent_task = task default_final_message = chunk yield chunk except asyncio.CancelledError: # Client disconnects print("Client disconnected") if agent_task: logger.info(f"Chat to App {gpts_name}:{agent_conv_id} Cancel!") agent_task.cancel() except Exception as e: logger.exception(f"Chat to App {gpts_name} Failed!" + str(e)) raise finally: logger.info(f"save agent chat info！{conv_uid}") if agent_task: final_message = await self.stable_message(agent_conv_id) if final_message: current_message.add_view_message(final_message) - else: + elif default_final_message: default_final_message = default_final_message.replace("data:", "") current_message.add_view_message(default_final_message) current_message.end_current_round() current_message.save_to_storage() `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a critical bug where conversation history is not saved for flow-based chats due to a commented-out `finally` block, which would lead to data loss.	High
	Fix incorrect async execution from sync Fix a `RuntimeError` by changing how the async `_async_load_data` function is called from the synchronous `_load_data_sync` context when an event loop is already running. packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/ext/falcon_file_parse.py [542-555] try: - asyncio.get_running_loop() + loop = asyncio.get_running_loop() + # The loop is running, so we can't use loop.run_until_complete. + # We must run the async code in a separate thread to not block the + # existing event loop, and then wait for its result. import concurrent.futures - - with concurrent.futures.ThreadPoolExecutor() as executor: - future = executor.submit( - asyncio.run, self._async_load_data(data_file, table_ddl_file) - ) - return future.result() + with concurrent.futures.ThreadPoolExecutor() as pool: + return loop.run_in_executor( + pool, + self._async_load_data, + data_file, + table_ddl_file + ).result() except RuntimeError: + # No event loop is running, we can create one. loop = get_or_create_event_loop() return loop.run_until_complete( self._async_load_data(data_file, table_ddl_file) ) `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies a `RuntimeError` that occurs when calling `asyncio.run()` from a thread that already has a running event loop, which is a critical bug in the async-from-sync execution logic.	Medium
	Correctly retrieve raw database connection Correct the logic for retrieving the raw DBAPI connection from a SQLAlchemy session to ensure compatibility with modern versions (1.4+) by accessing `connection_proxy.dbapi_connection` directly. packages/gptdb-serve/src/gptdb_serve/evaluate/service/fetchdata/benchmark_data_manager.py [885-894] try: - # SQLAlchemy 1.4+ / 2.0 - raw_conn = connection_proxy.connection.dbapi_connection + # SQLAlchemy 1.4+ + raw_conn = connection_proxy.dbapi_connection except AttributeError: try: - # 旧版本或某些驱动 + # Older SQLAlchemy versions raw_conn = connection_proxy.connection except AttributeError: - # 最后的尝试 + # Fallback for other cases raw_conn = session.get_bind().raw_connection() `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies an improper attribute access pattern for retrieving the raw DBAPI connection and proposes a more robust, version-compatible approach that improves code clarity and correctness.	Medium
	Make method async Add the `async` keyword to the `aload_document` method to match the base class's abstract method signature. packages/gptdb-ext/src/gptdb_ext/storage/full_text/opensearch.py [23-25] -def aload_document( +async def aload_document( self, chunks: List[Chunk], file_id: Optional[str] = None ) -> List[str]: `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 7 __ Why: The suggestion correctly points out that the method signature should be `async` to match the base class, which is important for correctness in an asynchronous framework.	Medium
	✅ ~~Reassign replaceAll result~~ Suggestion Impact: The catch block was updated to assign the result of replaceAll back to message, ensuring the newline replacement takes effect. code diff: } catch { - message.replaceAll('\\n', '\n'); + message = message.replaceAll('\\n', '\n'); } Fix a bug in the `catch` block by reassigning the result of `message.replaceAll` to correctly apply newline replacements. web/hooks/use-chat.ts [95-97] } catch { - message.replaceAll('\\n', '\n'); + message = message.replaceAll('\\n', '\n'); } `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 6 __ Why: This suggestion correctly identifies a bug where string immutability is overlooked, causing newline characters not to be replaced, which affects message rendering.	Low
Security	Fix Cypher injection vulnerability Fix a Cypher injection vulnerability by replacing f-string formatting with parameterized queries when building the match clauses for nodes and relationships. packages/gptdb-ext/src/gptdb_ext/datasource/conn_neo4j.py [208-240] # Check if label/relationship exists first to avoid warnings if table_type == "node": # Remove _node suffix if present label_name = table_name.replace("_node", "") check_result = session.run("CALL db.labels()") existing_labels = [record["label"] for record in check_result] if label_name not in existing_labels: return [] # Get properties for node label - query = f""" - MATCH (n:`{label_name}`) + query = """ + MATCH (n) + WHERE $label_name IN labels(n) WITH n LIMIT 100 UNWIND keys(n) AS key RETURN DISTINCT key AS property """ + params = {"label_name": label_name} else: # Remove _relationship suffix if present rel_name = table_name.replace("_relationship", "") check_result = session.run("CALL db.relationshipTypes()") existing_rels = [record["relationshipType"] for record in check_result] if rel_name not in existing_rels: return [] # Get properties for relationship type - query = f""" - MATCH ()-[r:`{rel_name}`]->() + query = """ + MATCH ()-[r]->() + WHERE type(r) = $rel_name WITH r LIMIT 100 UNWIND keys(r) AS key RETURN DISTINCT key AS property """ + params = {"rel_name": rel_name} -result = session.run(query) +result = session.run(query, **params) `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a Cypher injection vulnerability and provides a robust, parameterized query that mitigates the risk, which is a critical security improvement.	High
	Correct potentially insecure repository URL Correct the default repository URL from `khulnasoft-bot` to the official `khulnasoft` organization to prevent potential security risks. packages/gptdb-core/src/gptdb/util/gptdbs/base.py [16-18] DEFAULT_REPO_MAP = { - "khulnasoft/gptdbs": "https://github.com/khulnasoft-bot/gptdbs.git", + "khulnasoft/gptdbs": "https://github.com/khulnasoft/gptdbs.git", } `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 8 __ Why: This suggestion correctly identifies a potential security risk where a typo in a repository URL could lead to fetching code from an untrusted source, which is a critical issue.	Medium
	Correct potentially insecure plugin URL Correct the default plugin repository URL from `khulnasoft-bot` to the official `khulnasoft` organization to mitigate security risks from untrusted plugins. packages/gptdb-serve/src/gptdb_serve/agent/hub/model/model.py [30-37] class PluginHubParam(BaseModel): channel: Optional[str] = Field("git", description="Plugin storage channel") url: Optional[str] = Field( - "https://github.com/khulnasoft-bot/GPT-DB-Plugins.git", + "https://github.com/khulnasoft/GPT-DB-Plugins.git", description="Plugin storage url", ) branch: Optional[str] = Field( `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 8 __ Why: This suggestion correctly identifies a potential security risk where a typo in a plugin repository URL could lead to fetching and executing untrusted code, which is a critical issue.	Medium
High-level	Refactor the data loading logic The suggestion proposes abstracting the new benchmark data loading logic, which is currently tightly coupled to a specific GitHub repository and file formats (SQLite, JSON/JSONL), behind a more generic, format-agnostic interface. This would improve modularity and make future extensions easier. Examples: packages/gptdb-serve/src/gptdb_serve/evaluate/service/fetchdata/benchmark_data_manager.py [351-661] packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/ext/falcon_file_parse.py [183-338] Solution Walkthrough: Before: class BenchmarkDataManager: async def load_from_github(self, repo_url: str, data_dirs: List[str]): # ... discovers and merges .sqlite files specifically repo_dir = await self._download_repo_contents(repo_url) sqlite_files = self._discover_sqlite_files(repo_dir, data_dirs) await self._merge_sqlite_databases(sqlite_files) async def load_file_from_github(self, file_name: str): # ... logic to find and parse .json or .jsonl files if chosen_ext == ".json": return await self._parse_json_file(...) elif chosen_ext == ".jsonl": return await self._parse_jsonl_file(...) class FalconFileParseService: def __init__(self): self.benchmark_manager = get_benchmark_manager() After: # New abstraction class IDataSource(ABC): @abstractmethod async def load_data(self, config): ... class GitHubFalconSource(IDataSource): async def load_data(self, config): # Contains logic for downloading repo, finding files, # and merging/parsing them. ... class BenchmarkDataManager: def __init__(self): self._sources = {"github_falcon": GitHubFalconSource()} async def load_data(self, source_name, config): # Delegates to the appropriate source loader source = self._sources.get(source_name) await source.load_data(config) Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies that the new data loading logic is complex and tightly coupled to specific sources and formats, and proposing an abstraction layer is a valid architectural improvement.	Medium
General	✅ ~~Use a robust JSONPath library~~ Suggestion Impact: The commit removed the manual JSONPath parsing (string splitting and dict/list traversal) and replaced it with jsonpath-ng parsing and evaluation (parse/find), returning the first match value and logging warnings on errors. It also added an import of jsonpath-ng. code diff: @@ -7,6 +7,7 @@ from typing import Any, Dict, Optional, Union import aiohttp +from jsonpath_ng import parse as jsonpath_parse from gptdb_serve.evaluate.service.benchmark.models import ( AgentApiConfig, AgentCompletionRequest, @@ -38,32 +39,17 @@ if not json_path: return response_data - # Remove leading $. if present - path = json_path.lstrip("$.") - - # Split path by dots and brackets - parts = path.replace("[", ".").replace("]", "").split(".") - - current = response_data - for part in parts: - if not part: - continue - - try: - if isinstance(current, dict): - current = current.get(part) - elif isinstance(current, list): - index = int(part) - current = current[index] - else: - return None - - if current is None: - return None - except (KeyError, IndexError, ValueError, TypeError): + try: + jsonpath_expr = jsonpath_parse(json_path) + matches = jsonpath_expr.find(response_data) + if matches: + # Return the value of the first match + return matches[0].value + else: return None - - return current + except Exception as e: + logger.warning(f"Error parsing JSONPath '{json_path}': {e}") + return None Replace the fragile, custom JSONPath parsing implementation with the `jsonpath-ng` library to ensure robust and correct handling of JSONPath expressions. packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/task/benchmark_agent_task.py [27-67] @staticmethod def parse_json_path(response_data: Any, json_path: str) -> Any: """Parse response using JSON path expression. Args: response_data: The response data (dict or list) json_path: JSON path expression (e.g., "$.data.content") Returns: Extracted value or None if path not found """ if not json_path: return response_data + + try: + from jsonpath_ng import parse + except ImportError: + raise ImportError( + "jsonpath-ng is not installed. Please install it with " + "`pip install jsonpath-ng`" + ) - # Remove leading $. if present - path = json_path.lstrip("$.") + try: + jsonpath_expression = parse(json_path) + matches = jsonpath_expression.find(response_data) + if matches: + return matches[0].value + return None + except Exception: + logger.warning(f"Failed to parse JSON path: {json_path}") + return None - # Split path by dots and brackets - parts = path.replace("[", ".").replace("]", "").split(".") - - current = response_data - for part in parts: - if not part: - continue - - try: - if isinstance(current, dict): - current = current.get(part) - elif isinstance(current, list): - index = int(part) - current = current[index] - else: - return None - - if current is None: - return None - except (KeyError, IndexError, ValueError, TypeError): - return None - - return current - `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies that the manual JSONPath parsing is brittle and replaces it with a robust, standard library, significantly improving the code's correctness and reliability.	Medium
General	Warn on ambiguous file match Add a warning log when multiple candidate files are found during a file search to inform the user which file is being used. packages/gptdb-serve/src/gptdb_serve/evaluate/service/fetchdata/benchmark_data_manager.py [428-459] candidate_paths: List[str] = [] # Prefer direct path resolution using the relative path direct_path = os.path.join(repo_dir, normalized) if os.path.isfile(direct_path): ext = os.path.splitext(direct_path.lower())[1] if not requested_ext: if ext in allowed_exts: candidate_paths.append(direct_path) elif ext == requested_ext: candidate_paths.append(direct_path) # If not found, recursively search by filename match if not candidate_paths: target_name = os.path.basename(normalized) for root, _, files in os.walk(repo_dir): for f in files: if f == target_name: full = os.path.join(root, f) ext = os.path.splitext(f.lower())[1] if not requested_ext: if ext in allowed_exts: candidate_paths.append(full) elif ext == requested_ext: candidate_paths.append(full) if not candidate_paths: raise FileNotFoundError(f"File not found: {file_name}") +if len(candidate_paths) > 1: + logger.warning(f"Multiple files match '{file_name}', using '{candidate_paths[0]}'") chosen = sorted(candidate_paths, key=lambda p: (len(p), p))[0] `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 5 __ Why: The suggestion improves user experience by adding a warning when multiple files match a search, preventing silent and potentially incorrect file selection, which enhances debuggability.	Low
Update

coderabbitai

Actionable comments posted: 11

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docker/examples/dashboard/test_case_sqlite_data.py (1)
118-120: Fix mismatched and non-numeric user_id assignment.

users entries don’t include a numeric id, so user_id = random.choice(users)[0] assigns the name string, and user_name is picked from a different random user, breaking referential integrity. Pick a single user tuple and map fields consistently (e.g., name/email), or store IDs from inserts.
💡 Proposed fix (consistent user selection)
-        user_id = random.choice(users)[0]
-        user_name = random.choice(users)[1]
+        user = random.choice(users)
+        user_id = user[0]   # If you intend to store name as ID, rename column or adjust schema
+        user_name = user[1]
If user_id is meant to reference the autoincrement user.id, capture those IDs at insert time (e.g., cursor.lastrowid) and store them in a separate list instead of using the name tuple.

🤖 Fix all issues with AI agents

In @.github/ISSUE_TEMPLATE/bug-report.yml:
- Around line 87-93: Update the four installation links in
.github/ISSUE_TEMPLATE/bug-report.yml so they point to valid, non-404
documentation pages; specifically replace the URLs used for "Installation From
Source", "Docker Installation", "Docker Compose Installation" and "Cluster
Installation" with the correct db-gpt.readthedocs.io paths (ensure Docker and
Docker Compose are different URLs), verify each URL opens successfully, and
update the corresponding label entries in the same file so the visible link text
and destination match the corrected docs.

In `@demo_selenium.py`:
- Around line 1-4: demo_selenium.py imports selenium (from selenium import
webdriver) but selenium is only declared in
packages/gptdb-sandbox/pyproject.toml; fix by either moving demo_selenium.py
into the packages/gptdb-sandbox/ folder so it lives with the declared
dependency, or add "selenium>=4.0.0" to the root pyproject.toml dependencies so
imports like webdriver and Options resolve when the root package is installed.
- Around line 6-9: Remove the deprecated options.headless assignment (Options
object defaults to non-headless) and, if you ever need headless in future, use
options.add_argument("-headless") to enable it; also move the driver
instantiation and any navigation/screenshot logic that currently runs at module
import into a main guard by wrapping the code that creates Options(), calls
webdriver.Firefox(...), and performs actions inside if __name__ == "__main__":
so imports no longer start the browser automatically.

In `@docker/compose_examples/conf/ha-model-cluster.toml`.:
- Line 6: Replace the hard-coded encrypt_key default ("your_secret_key") with a
required value sourced from an environment variable (e.g. ENCRYPT_KEY) and
update the config loading logic to fail fast if that env var is missing: remove
the predictable fallback for the encrypt_key key, read the key from process/env
in your config loader (or equivalent config parsing function), and throw/exit
with a clear error when ENCRYPT_KEY is not provided so deployments cannot
silently use a weak default.

In `@docker/compose_examples/conf/ha-webserver.toml`.:
- Line 6: The config currently hardcodes encrypt_key = "your_secret_key"; change
this to read the encryption key from an environment variable (e.g. ENCRYPT_KEY)
instead of a literal string, update any loader/initialization code that reads
encrypt_key to validate the env var and fail fast if it's missing, and ensure
the default placeholder is removed from the repo and documented as an env
requirement; reference the encrypt_key setting in the config and the code path
that loads it when making the change.

In `@docker/examples/my.cnf`.:
- Around line 21-37: The config sets
default-authentication-plugin=mysql_native_password which forces the weaker
legacy auth; change this to be opt-in by commenting out or removing the
default-authentication-plugin=mysql_native_password line so the server uses the
stronger default (caching_sha2_password) unless explicitly enabled for legacy
clients; update the configuration that contains the
default-authentication-plugin setting (and any duplicate occurrences) to be
commented out and add a short comment explaining it's opt-in for legacy clients.

In `@docker/examples/sqls/case_1_student_manager_sqlite.sql`.:
- Around line 15-22: The current PRIMARY KEY on the scores table (PRIMARY KEY
(student_id, course_id)) prevents storing multiple semesters for the same
student/course; update the scores table definition (table name: scores, columns:
student_id, course_id, semester) to include semester in the composite primary
key (e.g., PRIMARY KEY (student_id, course_id, semester)) or alternatively add a
surrogate id column and make that the PK while keeping student_id, course_id,
semester as a UNIQUE constraint to allow multiple semester rows without
conflicts.

In `@docker/examples/sqls/case_2_ecom_sqlite.sql`.:
- Around line 25-35: The seed data in the INSERT INTO users statement contains
real-looking emails; update the user_email values to use reserved example
domains (e.g., user@example.com or user@example.org) to avoid PII. Edit the
INSERT INTO users (... user_email ...) VALUES clause and replace each gmail
address (e.g., 'john@gmail.com', 'mary@gmail.com', etc.) with non-real addresses
(e.g., 'user1@example.com', 'user2@example.com') while keeping the rest of the
columns (user_id, user_name, registration_date, user_country) unchanged.

In `@docker/examples/sqls/test_case_info_sqlite.sql`.:
- Line 17: The SQL in the VALUES tuple uses MySQL's YEAR(orders.order_date)
which SQLite doesn't support; update the SELECT in that tuple to use SQLite's
strftime('%Y', orders.order_date) and compare to the string '2023' (e.g.
strftime('%Y', orders.order_date) = '2023') so the expected_sql in the tuple
(the one selecting products.product_name with JOIN orders ...) is
SQLite-compatible.

In `@docker/examples/sqls/user_config_mysql.sql`.:
- Around line 1-2: The SQL snippet updates mysql.user to set Host='%' for
User='root', which enables remote root access and is insecure; replace this
pattern by keeping root bound to localhost (do not change Host='%' for
User='root'), remove direct UPDATE mysql.user calls, and instead create a
dedicated application user via CREATE USER and GRANT with only the necessary
privileges for remote access, then FLUSH PRIVILEGES; locate the statements
UPDATE user SET Host='%' WHERE User='root' and replace them with the CREATE USER
/ GRANT flow and avoid modifying the root account.

In `@Dockerfile`:
- Around line 24-26: The Dockerfile currently hardcodes the VNC password using
the RUN line that calls x11vnc -storepasswd 123456 /root/.vnc/passwd; remove
that hardcoded invocation and instead create a runtime setup (entrypoint) that
reads a VNC password from an environment variable (e.g., VNC_PASSWORD) or a
secret file (e.g., /run/secrets/vnc_password), ensures /root/.vnc exists, stores
the password with x11vnc -storepasswd using the provided value, and tightens
file permissions on /root/.vnc/passwd; update the Dockerfile to copy the
entrypoint script and set it as ENTRYPOINT/CMD and keep a safe fallback behavior
(error or random password) when no password is provided.

🟡 Minor comments (13)

docker/examples/sqls/case_1_student_manager_sqlite.sql.-1-7 (1)
1-7: Enable FK enforcement in SQLite.

SQLite does not enforce foreign keys unless PRAGMA foreign_keys=ON is set. Add it at the top of the script to avoid silent integrity violations.
🔧 Proposed fix
+PRAGMA foreign_keys = ON;
+
 CREATE TABLE students (
     student_id INTEGER PRIMARY KEY,
     student_name VARCHAR(100),
     major VARCHAR(100),
     year_of_enrollment INTEGER,
     student_age INTEGER
 );
docker/examples/sqls/test_case.md.-79-87 (1)

79-87: Database compatibility issue: YEAR() function is MySQL-specific.

The YEAR(orders.order_date) function works in MySQL but not in SQLite or Vertica. Since this PR includes test data for multiple database engines (MySQL, SQLite, Vertica), this query won't work consistently across all targets.

SQLite: Use strftime('%Y', orders.order_date) = '2023'

Vertica: Use EXTRACT(YEAR FROM orders.order_date) = 2023

Consider either documenting this as MySQL-only or providing engine-specific variants.
docker/examples/sqls/case_3_order_wide_table_sqlite_wide.sql.-1-5 (1)
1-5: Missing PRIMARY KEY constraint.

The order_wide_table lacks a PRIMARY KEY. For an order table, order_id should typically be the primary key to ensure uniqueness and enable efficient lookups.
Suggested fix
 CREATE TABLE order_wide_table (
 
     -- order_base
-    order_id TEXT, -- 订单ID
+    order_id TEXT PRIMARY KEY, -- 订单ID
     order_no TEXT, -- 订单编号
docker/examples/sqls/case_2_ecom_sqlite.sql.-1-7 (1)
1-7: Enable SQLite foreign key enforcement.
SQLite ignores FK constraints unless PRAGMA foreign_keys = ON; is set, which can silently allow invalid orders rows when this dataset is extended.
✅ Proposed fix
+PRAGMA foreign_keys = ON;
+
 CREATE TABLE users (
     user_id INTEGER PRIMARY KEY,
     user_name VARCHAR(100),
     user_email VARCHAR(100),
Also applies to: 15-23
docker/compose_examples/conf/ha-model-cluster.toml.-4-4 (1)
4-4: Clarify the log level configuration.
system.log_level = "INFO" conflicts with [log].level = "DEBUG". Align them or document precedence to avoid unexpected logging behavior.
🛠️ Example alignment
-[log]
-level = "DEBUG"
+[log]
+level = "INFO"
Also applies to: 64-64
docker/compose_examples/conf/ha-model-cluster.toml.-13-21 (1)
13-21: Use consistent environment variable defaults in the TOML file.

This file inconsistently handles environment variables: GPTDB_LANG (line 3) and WORKER_TYPE (line 28) have defaults, but CONTROLLER_ADDR and the MYSQL_* database fields lack defaults. For consistency with other configuration files (e.g., configs/gptdb-proxy-siliconflow-mysql.toml), add safe defaults like:
controller_addr = "${env:CONTROLLER_ADDR:-http://localhost:8000}"
host = "${env:MYSQL_HOST:-localhost}"
port = "${env:MYSQL_PORT:-3306}"
database = "${env:MYSQL_DATABASE:-gptdb}"
user = "${env:MYSQL_USER:-root}"
password = "${env:MYSQL_PASSWORD:-aa123456}"
While the accompanying ha-cluster-docker-compose.yml does provide these values, the TOML files should be self-contained for portability and clarity.

Also applies to: lines 55-61
.pre-commit-config.yaml-1-1 (1)

1-1: Fix line endings to satisfy YAMLlint.

YAMLlint reports CRLF line endings here; converting this file to LF will clear the lint error and avoid CI/pre-commit failures.
docs/docs/changelog/Released_V0.5.0.md-261-265 (1)
261-265: Fix markdownlint MD034: bare URLs in Appendix.

Convert bare URLs to Markdown links to satisfy lint and improve readability.
✅ Suggested update
-- GPT-DB framework: https://github.com/khulnasoft 
-- Text2SQL fine tuning: https://github.com/khulnasoft-bot/GPT-DB-Hub 
-- GPT-DB-Web : https://github.com/khulnasoft-bot/GPT-DB-Web 
-- official English documentation: http://docs.gptdb.site/docs/overview 
-- official Chinese documentation: https://www.yuque.com/khulnasoft/gptdb-docs/bex30nsv60ru0fmx
+- GPT-DB framework: [khulnasoft](https://github.com/khulnasoft)
+- Text2SQL fine tuning: [GPT-DB-Hub](https://github.com/khulnasoft-bot/GPT-DB-Hub)
+- GPT-DB-Web : [GPT-DB-Web](https://github.com/khulnasoft-bot/GPT-DB-Web)
+- official English documentation: [Overview](http://docs.gptdb.site/docs/overview)
+- official Chinese documentation: [概览](https://www.yuque.com/khulnasoft/gptdb-docs/bex30nsv60ru0fmx)
docs/blog/2025-03-24-gptdb-v0.7.0-release.md-432-432 (1)
432-432: Address markdownlint: bare email in contributors list.

The email triggers MD034; format it as a mailto link or remove it.
✅ Suggested update
-**283569391@qq.com, `@15089677014`, `@Aries-ckt`, `@FOkvj`, `@Jant1L`, `@SonglinLyu`, `@TenYearOldJAVA`, `@Weaxs`, `@cinjoseph`, `@csunny`, `@damonqin`, `@dusx1981`, `@fangyinc`, `@geebytes`, `@haawha`, `@utopia2077`, `@vnicers`, `@xuxl2024`, `@yhjun1026`, `@yunfeng1993`, `@yyhhyyyyyy` and tam**
+**[283569391@qq.com](mailto:283569391@qq.com), `@15089677014`, `@Aries-ckt`, `@FOkvj`, `@Jant1L`, `@SonglinLyu`, `@TenYearOldJAVA`, `@Weaxs`, `@cinjoseph`, `@csunny`, `@damonqin`, `@dusx1981`, `@fangyinc`, `@geebytes`, `@haawha`, `@utopia2077`, `@vnicers`, `@xuxl2024`, `@yhjun1026`, `@yunfeng1993`, `@yyhhyyyyyy` and tam**
docs/docs/gptdbs/introduction.md-3-3 (1)

3-3: Update repository links: khulnasoft-bot/gptdbs is not publicly accessible (404).

The repository at https://github.com/khulnasoft-bot/gptdbs returns 404 Not Found. This affects the links on lines 3, 26-27, and 38. If this is an internal or renamed repository, verify the correct public path. Public references to awel-flow-simple-streaming-chat point to eosphoros-ai/dbgpts or eosphoros-ai/DB-GPT instead.
README.md-225-227 (1)
225-227: Fix the disclaimer link filename.

DISCKAIMER.md looks misspelled and will likely 404.
📝 Suggested fix
-📖 **[Disclaimer](./DISCKAIMER.md)** - Please read our disclaimer for important information about the project.
+📖 **[Disclaimer](./DISCLAIMER.md)** - Please read our disclaimer for important information about the project.
README.md-150-160 (1)
150-160: Replace the replacement‑character headings.

The � glyphs will render incorrectly; swap to real emoji or plain text.
📝 Suggested fix
-### � Privacy and Security
+### Privacy and Security
@@
-### � Supported Data Sources
+### Supported Data Sources
README.md-74-76 (1)

74-76: Update "Latest Release" section to reflect the current version.

The README claims "Latest Release: V0.7.0", but as of January 2026, the latest published DB-GPT release is v0.7.4 (October 24, 2025). Update the version number and release notes link accordingly.

🧹 Nitpick comments (16)

docker/examples/sqls/case_2_ecom_mysql.sql. (2)
1-2: Prefer utf8mb4 and make DB creation idempotent.

Using MySQL utf8 limits characters, and re-running the script can fail if the DB already exists. Consider utf8mb4 and IF NOT EXISTS for a more robust example.
♻️ Proposed change
-create database case_2_ecom character set utf8;
+create database if not exists case_2_ecom
+  character set utf8mb4
+  collate utf8mb4_0900_ai_ci;
 use case_2_ecom;
12-16: Use DECIMAL for prices to avoid rounding artifacts.

FLOAT can introduce precision errors in money values. DECIMAL(10,2) is the typical choice for currency.
♻️ Proposed change
 CREATE TABLE products (
     product_id INT PRIMARY KEY,
     product_name VARCHAR(100) COMMENT '商品名称',
-    product_price FLOAT COMMENT '商品价格'
+    product_price DECIMAL(10,2) COMMENT '商品价格'
 ) COMMENT '商品信息表';
docker/examples/sqls/test_case_info_sqlite.sql. (1)

14-14: Non-standard GROUP BY usage may cause issues in strict SQL environments.

The expected_sql groups by students.student_id but selects students.student_name, which is not in the GROUP BY clause. While SQLite is permissive and allows this, it's non-standard SQL and could produce unexpected results if a student_id maps to multiple names (though unlikely given it's a primary key).

This is acceptable for test data but worth noting for consistency with stricter SQL modes.
docker/compose_examples/conf/ha-webserver.toml. (3)
13-13: Consider adding a default value or documenting that CONTROLLER_ADDR is required.

Unlike other environment variables in this file (e.g., GPTDB_LANG, EMBEDDING_MODEL_NAME), controller_addr has no fallback default. If CONTROLLER_ADDR is unset, this may cause unclear startup failures.
💡 Option: Add a placeholder default or comment
-controller_addr = "${env:CONTROLLER_ADDR}"
+# Required: Set CONTROLLER_ADDR environment variable
+controller_addr = "${env:CONTROLLER_ADDR}"
Or, if a sensible default exists:
-controller_addr = "${env:CONTROLLER_ADDR}"
+controller_addr = "${env:CONTROLLER_ADDR:-http://controller:8000}"
21-21: Minor formatting inconsistency.

Missing space before = on this line, while other lines use key = value format consistently.
✏️ Suggested fix
-password ="${env:MYSQL_PASSWORD}"
+password = "${env:MYSQL_PASSWORD}"
27-28: Redundant log level configuration.

log_level is already set to "INFO" in the [system] section (line 4). Having it in both places could cause confusion about which takes precedence or lead to inconsistencies if only one is updated.
docker/examples/sqls/test_case.md. (2)
33-45: Minor inconsistency: Missing "SQL：" label.

Other queries have an explicit "SQL：" label before the code block (see lines 13, 25, 55), but Q3 is missing this label for consistency.
Suggested fix
 查询哪些学生在 "2023年春季" 学期的课程学分总和超过2学分

+SQL：
 ```sql
 SELECT students.student_name
57-62: Potential strict SQL mode issue: GROUP BY column mismatch.

The query selects users.user_name but groups by users.user_id. While this works in MySQL with default settings, it may fail in strict SQL mode (ONLY_FULL_GROUP_BY) or in other databases where all non-aggregated SELECT columns must appear in GROUP BY.

The same pattern appears in Q2 (lines 68-73) with products.product_name vs products.product_id.
Suggested fix for stricter SQL compliance
-SELECT users.user_name, COUNT(orders.order_id) as order_count
-FROM users
-JOIN orders ON users.user_id = orders.user_id
-GROUP BY users.user_id;
+SELECT users.user_name, COUNT(orders.order_id) as order_count
+FROM users
+JOIN orders ON users.user_id = orders.user_id
+GROUP BY users.user_id, users.user_name;
demo_selenium.py (2)
15-15: Prefer explicit waits over time.sleep().

time.sleep(5) is fragile—it either waits too long or not long enough. Use WebDriverWait with expected conditions for reliable synchronization.
♻️ Suggested improvement
+from selenium.webdriver.support.ui import WebDriverWait
+from selenium.webdriver.support import expected_conditions as EC
+from selenium.webdriver.common.by import By
...
-    time.sleep(5)  # পৃষ্ঠা লোড হওয়ার জন্য অপেক্ষা করুন
+    # Wait until the page title contains "Python"
+    WebDriverWait(driver, 10).until(EC.title_contains("Python"))
20-22: Hardcoded path assumes root user on Linux.

/root/screenshot.png is not portable and requires root privileges. Consider using a relative path or a configurable location.
♻️ Suggested improvement
-    screenshot_path = "/root/screenshot.png"
+    screenshot_path = "screenshot.png"  # Or use tempfile/pathlib for cross-platform support
docker/examples/sqls/case_1_student_manager_mysql.sql. (1)
1-1: Use utf8mb4 and IF NOT EXISTS for broader Unicode support and repeatable setup.

MySQL's utf8 is a deprecated alias for utf8mb3, which supports only BMP characters (U+0000..U+FFFF) and cannot store emoji or supplementary Unicode characters. Use utf8mb4 (full UTF-8) instead. Additionally, IF NOT EXISTS makes the script reusable without manual deletion on rerun.
♻️ Recommended change
-create database case_1_student_manager character set utf8;
+create database if not exists case_1_student_manager character set utf8mb4;
docs/docs/modules/benchmark.md (1)
81-82: Prefer HTTPS for docs links if supported.
Switching to HTTPS avoids mixed-content warnings and improves security for readers. Please verify the endpoint supports HTTPS and update accordingly.
🔧 Suggested update (if HTTPS is supported)
-  - Method 1: Configure via configuration file. Reference: [ProxyModel Configuration](http://gpt.khulnasoft.com/docs/next/installation/advanced_usage/More_proxyllms)
-  - Method 2: Configure via product page. Reference: [Models](http://gpt.khulnasoft.com/docs/next/application/llms)
+  - Method 1: Configure via configuration file. Reference: [ProxyModel Configuration](https://gpt.khulnasoft.com/docs/next/installation/advanced_usage/More_proxyllms)
+  - Method 2: Configure via product page. Reference: [Models](https://gpt.khulnasoft.com/docs/next/application/llms)
docs/docs/faq/kbqa.md (1)

56-56: Optional: tighten repetitive phrasing.
Consider rewording to avoid repeated “If you want to …” in adjacent sentences for readability.

docs/docs/installation/model_service/cluster.md (1)

21-21: Optional: remove leading $ in code blocks for copy-paste.
This avoids MD014 lint noise and makes commands directly runnable.
Dockerfile (1)
21-23: Avoid masking install failures and clean apt lists.

|| true hides real errors and leaves apt cache in the layer. Consider gating the install behind a build arg and cleaning up.
♻️ Suggested refactor
-# 安装 NVIDIA GPU 工具（如 nvidia-smi）
-RUN apt-get update && apt-get install -y nvidia-utils-525 || true
+ARG INSTALL_NVIDIA_UTILS=false
+RUN if [ "$INSTALL_NVIDIA_UTILS" = "true" ]; then \
+      apt-get update && \
+      apt-get install -y --no-install-recommends nvidia-utils-525 && \
+      rm -rf /var/lib/apt/lists/*; \
+    fi
Makefile (1)
1-33: Declare phony targets to avoid filename collisions.

This prevents a file named like a target (e.g., setup) from blocking the rule.
🧹 Suggested fix
 .DEFAULT_GOAL := help
+.PHONY: setup testenv fmt fmt-check test test-% all-test mypy \
+	build build-% all-build docker-% all-docker publish publish-test publish-% \
+	clean clean-dist help

coderabbitai · 2026-01-23T02:06:34Z

.github/ISSUE_TEMPLATE/bug-report.yml

+            [Installation From Source](https://db-gpt.readthedocs.io/en/latest/getting_started/install/deploy/deploy.html)
        - label: >
-            [Docker Installation](https://gpt-db.readthedocs.io/en/latest/getting_started/install/docker/docker.html)
+            [Docker Installation](https://db-gpt.readthedocs.io/en/latest/getting_started/install/docker/docker.html)
        - label: >
-            [Docker Compose Installation](https://gpt-db.readthedocs.io/en/latest/getting_started/install/docker/docker.html)
+            [Docker Compose Installation](https://db-gpt.readthedocs.io/en/latest/getting_started/install/docker/docker.html)
        - label: >
-            [Cluster Installation](https://gpt-db.readthedocs.io/en/latest/getting_started/install/llm/cluster/model_cluster.html)
+            [Cluster Installation](https://db-gpt.readthedocs.io/en/latest/getting_started/install/llm/cluster/model_cluster.html)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, let's examine the file to confirm the lines and content echo "=== Checking file content at lines 87-93 ===" sed -n '85,95p' .github/ISSUE_TEMPLATE/bug-report.yml echo "" # Check the URLs for accessibility echo "=== Checking URL accessibility ===" echo "1. Installation From Source URL..." curl -s -o /dev/null -w "HTTP Status: %{http_code}\n" "https://db-gpt.readthedocs.io/en/latest/getting_started/install/deploy/deploy.html" echo "2. Docker Installation URL..." curl -s -o /dev/null -w "HTTP Status: %{http_code}\n" "https://db-gpt.readthedocs.io/en/latest/getting_started/install/docker/docker.html" echo "3. Cluster Installation URL..." curl -s -o /dev/null -w "HTTP Status: %{http_code}\n" "https://db-gpt.readthedocs.io/en/latest/getting_started/install/llm/cluster/model_cluster.html"

Repository: khulnasoft/GPT-DB

Length of output: 930

All installation documentation links return 404 errors and must be corrected before merging.

The four URLs in the installation options are inaccessible:

Installation From Source: 404

Docker Installation: 404

Docker Compose Installation: 404

Cluster Installation: 404

Additionally, Docker Installation and Docker Compose Installation both point to the same URL (db-gpt.readthedocs.io/en/latest/getting_started/install/docker/docker.html), which needs to be corrected. Verify the correct documentation URLs for the db-gpt.readthedocs.io domain before this change can be merged.

🤖 Prompt for AI Agents

In @.github/ISSUE_TEMPLATE/bug-report.yml around lines 87 - 93, Update the four installation links in .github/ISSUE_TEMPLATE/bug-report.yml so they point to valid, non-404 documentation pages; specifically replace the URLs used for "Installation From Source", "Docker Installation", "Docker Compose Installation" and "Cluster Installation" with the correct db-gpt.readthedocs.io paths (ensure Docker and Docker Compose are different URLs), verify each URL opens successfully, and update the corresponding label entries in the same file so the visible link text and destination match the corrected docs.

coderabbitai · 2026-01-23T02:06:34Z

demo_selenium.py

+import time
+
+from selenium import webdriver
+from selenium.webdriver.firefox.options import Options


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if pyproject.toml exists and search for selenium echo "=== Checking for pyproject.toml ===" if [ -f "pyproject.toml" ]; then echo "pyproject.toml found. Searching for selenium..." grep -i "selenium" pyproject.toml || echo "selenium not found in pyproject.toml" else echo "pyproject.toml not found in repo root" fi echo -e "\n=== Checking for requirements files ===" find . -maxdepth 2 -type f $ -name "requirements*.txt" -o -name "setup.py" -o -name "setup.cfg" $ 2>/dev/null | while read file; do echo "Found: $file" grep -i "selenium" "$file" || echo " - selenium not found in $file" done echo -e "\n=== Checking if demo_selenium.py exists ===" if [ -f "demo_selenium.py" ]; then echo "demo_selenium.py found. First 10 lines:" head -10 demo_selenium.py else echo "demo_selenium.py not found in repo root" fi

Repository: khulnasoft/GPT-DB

Length of output: 572

🏁 Script executed:

#!/bin/bash # Read the pyproject.toml to see all dependency declarations echo "=== Full pyproject.toml content ===" cat pyproject.toml

Repository: khulnasoft/GPT-DB

Length of output: 2235

🏁 Script executed:

#!/bin/bash # Find demo_selenium.py location echo "=== Location of demo_selenium.py ===" find . -name "demo_selenium.py" -type f echo -e "\n=== Checking workspace members for pyproject.toml files with selenium ===" for dir in packages/gptdb-app packages/gptdb-client packages/gptdb-core packages/gptdb-ext packages/gptdb-serve packages/gptdb-sandbox; do if [ -d "$dir" ]; then if [ -f "$dir/pyproject.toml" ]; then echo "Checking $dir/pyproject.toml for selenium..." grep -i "selenium" "$dir/pyproject.toml" && echo " Found in $dir" || echo " Not found in $dir" fi fi done

Repository: khulnasoft/GPT-DB

Length of output: 788

🏁 Script executed:

#!/bin/bash # Check gptdb-sandbox dependencies echo "=== gptdb-sandbox/pyproject.toml dependencies section ===" grep -A 20 "dependencies" packages/gptdb-sandbox/pyproject.toml | head -30 echo -e "\n=== Check if demo_selenium.py imports gptdb packages ===" grep -E "^from gptdb|^import gptdb" ./demo_selenium.py || echo "No gptdb imports found" echo -e "\n=== Full demo_selenium.py content ===" cat ./demo_selenium.py

Repository: khulnasoft/GPT-DB

Length of output: 1594

demo_selenium.py at root level depends on selenium, which is only declared in gptdb-sandbox workspace member.

The selenium>=4.0.0 dependency is declared in packages/gptdb-sandbox/pyproject.toml, not in the root pyproject.toml. Since demo_selenium.py is located at the repository root and imports selenium directly, the dependency will not be available if the root package is installed standalone. Either move this script into packages/gptdb-sandbox/ where the dependency is declared, or add selenium to the root project dependencies.

🤖 Prompt for AI Agents

In `@demo_selenium.py` around lines 1 - 4, demo_selenium.py imports selenium (from selenium import webdriver) but selenium is only declared in packages/gptdb-sandbox/pyproject.toml; fix by either moving demo_selenium.py into the packages/gptdb-sandbox/ folder so it lives with the declared dependency, or add "selenium>=4.0.0" to the root pyproject.toml dependencies so imports like webdriver and Options resolve when the root package is installed.

coderabbitai · 2026-01-23T02:06:34Z

demo_selenium.py

+options = Options()
+options.headless = False   # অবশ্যই False হতে হবে, যাতে VNC ডেস্কটপে ব্রাউজারের কার্যকলাপ দেখা যায়
+
+driver = webdriver.Firefox(options=options)


🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd demo_selenium.py

Repository: khulnasoft/GPT-DB

Length of output: 77

🏁 Script executed:

cat -n demo_selenium.py

Repository: khulnasoft/GPT-DB

Length of output: 909

🌐 Web query:

Selenium 4 Firefox headless mode options.headless deprecated

💡 Result:

In Selenium 4.8.0, the convenience API (options.headless = True, options.setHeadless(true), etc.) was deprecated (and then removed in later Selenium versions) in favor of passing the headless mode via browser arguments. (selenium.dev)

Firefox (GeckoDriver): what to use now

Python

from selenium import webdriver from selenium.webdriver.firefox.options import Options opts = Options() opts.add_argument("-headless") # or "--headless" driver = webdriver.Firefox(options=opts)

(Using options.headless = True is what triggers the deprecation warning.) (selenium.dev)

Java

FirefoxOptions options = new FirefoxOptions(); options.addArguments("-headless"); // or "--headless" WebDriver driver = new FirefoxDriver(options);

(stackoverflow.com)

Important note: --headless=new is Chromium-only

--headless=new is for Chromium’s “new headless” mode; Firefox doesn’t use --headless=new. (selenium.dev)

Citations:

1: https://www.selenium.dev/blog/2023/headless-is-going-away/?utm_source=openai

2: https://www.selenium.dev/blog/2023/headless-is-going-away/?utm_source=openai

3: https://stackoverflow.com/questions/76354308/what-is-the-updated-argument-for-firefox-browser-to-run-test-selenium-java-in?utm_source=openai

4: https://www.selenium.dev/blog/2023/headless-is-going-away/?utm_source=openai

Deprecated headless attribute and module-level execution.

options.headless is deprecated in Selenium 4.8.0+. Since the code intends non-headless mode (visible browser), simply remove this line—non-headless is the default. If you explicitly need to set it in future versions, use options.add_argument("-headless") only to enable headless mode.

Module-level code instantiation means the driver, navigation, and screenshot run on import. Wrap the entire execution logic in if __name__ == "__main__":.

🤖 Prompt for AI Agents

In `@demo_selenium.py` around lines 6 - 9, Remove the deprecated options.headless assignment (Options object defaults to non-headless) and, if you ever need headless in future, use options.add_argument("-headless") to enable it; also move the driver instantiation and any navigation/screenshot logic that currently runs at module import into a main guard by wrapping the code that creates Options(), calls webdriver.Firefox(...), and performs actions inside if __name__ == "__main__": so imports no longer start the browser automatically.

coderabbitai · 2026-01-23T02:06:34Z

docker/compose_examples/conf/ha-model-cluster.toml.

+language = "${env:GPTDB_LANG:-en}"
+log_level = "INFO"
+api_keys = []
+encrypt_key = "your_secret_key"


⚠️ Potential issue | 🟠 Major

Avoid hard‑coded encryption key default.
"your_secret_key" is a predictable fallback and could lead to weak encryption in real deployments. Make this required via env (or fail fast).

🔒 Proposed fix

-encrypt_key = "your_secret_key" +encrypt_key = "${env:ENCRYPT_KEY}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

encrypt_key = "your_secret_key"

encrypt_key = "${env:ENCRYPT_KEY}"

🤖 Prompt for AI Agents

In `@docker/compose_examples/conf/ha-model-cluster.toml`. at line 6, Replace the hard-coded encrypt_key default ("your_secret_key") with a required value sourced from an environment variable (e.g. ENCRYPT_KEY) and update the config loading logic to fail fast if that env var is missing: remove the predictable fallback for the encrypt_key key, read the key from process/env in your config loader (or equivalent config parsing function), and throw/exit with a clear error when ENCRYPT_KEY is not provided so deployments cannot silently use a weak default.

coderabbitai · 2026-01-23T02:06:34Z

docker/compose_examples/conf/ha-webserver.toml.

+language = "${env:GPTDB_LANG:-en}"
+log_level = "INFO"
+api_keys = []
+encrypt_key = "your_secret_key"


⚠️ Potential issue | 🟠 Major

Hardcoded encryption key is a security risk.

The placeholder "your_secret_key" should be replaced with an environment variable reference to avoid accidental deployment with a weak, publicly-known key. Secrets should never be committed to version control.

🔒 Suggested fix

-encrypt_key = "your_secret_key" +encrypt_key = "${env:GPTDB_ENCRYPT_KEY}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

encrypt_key = "your_secret_key"

encrypt_key = "${env:GPTDB_ENCRYPT_KEY}"

🤖 Prompt for AI Agents

In `@docker/compose_examples/conf/ha-webserver.toml`. at line 6, The config currently hardcodes encrypt_key = "your_secret_key"; change this to read the encryption key from an environment variable (e.g. ENCRYPT_KEY) instead of a literal string, update any loader/initialization code that reads encrypt_key to validate the env var and fail fast if it's missing, and ensure the default placeholder is removed from the repo and documented as an env requirement; reference the encrypt_key setting in the config and the code path that loads it when making the change.

coderabbitai · 2026-01-23T02:06:34Z

docker/examples/sqls/case_1_student_manager_sqlite.sql.

+CREATE TABLE scores (
+    student_id INTEGER,
+    course_id INTEGER,
+    score INTEGER,
+    semester VARCHAR(50),
+    PRIMARY KEY (student_id, course_id),
+    FOREIGN KEY (student_id) REFERENCES students(student_id),
+    FOREIGN KEY (course_id) REFERENCES courses(course_id)


⚠️ Potential issue | 🟠 Major

Primary key blocks multi-semester scores for the same student/course.

With semester present, (student_id, course_id) as the PK prevents storing multiple semesters for a repeated course. Consider including semester in the PK (or adding a surrogate key) to avoid insert conflicts later.

🐛 Proposed fix (include semester in the PK)

CREATE TABLE scores ( student_id INTEGER, course_id INTEGER, score INTEGER, semester VARCHAR(50), - PRIMARY KEY (student_id, course_id), + PRIMARY KEY (student_id, course_id, semester), FOREIGN KEY (student_id) REFERENCES students(student_id), FOREIGN KEY (course_id) REFERENCES courses(course_id) );

🤖 Prompt for AI Agents

In `@docker/examples/sqls/case_1_student_manager_sqlite.sql`. around lines 15 - 22, The current PRIMARY KEY on the scores table (PRIMARY KEY (student_id, course_id)) prevents storing multiple semesters for the same student/course; update the scores table definition (table name: scores, columns: student_id, course_id, semester) to include semester in the composite primary key (e.g., PRIMARY KEY (student_id, course_id, semester)) or alternatively add a surrogate id column and make that the PK while keeping student_id, course_id, semester as a UNIQUE constraint to allow multiple semester rows without conflicts.

coderabbitai · 2026-01-23T02:06:35Z

docker/examples/sqls/case_2_ecom_sqlite.sql.

+INSERT INTO users (user_id, user_name, user_email, registration_date, user_country) VALUES
+(1, 'John', 'john@gmail.com', '2020-01-01', 'USA'),
+(2, 'Mary', 'mary@gmail.com', '2021-01-01', 'UK'),
+(3, 'Bob', 'bob@gmail.com', '2020-01-01', 'USA'),
+(4, 'Alice', 'alice@gmail.com', '2021-01-01', 'UK'),
+(5, 'Charlie', 'charlie@gmail.com', '2020-01-01', 'USA'),
+(6, 'David', 'david@gmail.com', '2021-01-01', 'UK'),
+(7, 'Eve', 'eve@gmail.com', '2020-01-01', 'USA'),
+(8, 'Frank', 'frank@gmail.com', '2021-01-01', 'UK'),
+(9, 'Grace', 'grace@gmail.com', '2020-01-01', 'USA'),
+(10, 'Helen', 'helen@gmail.com', '2021-01-01', 'UK');


⚠️ Potential issue | 🟠 Major

Avoid real-looking PII in seed emails.
These Gmail-style addresses can map to real people; prefer reserved example domains to eliminate privacy/compliance risk.

✅ Proposed fix

-INSERT INTO users (user_id, user_name, user_email, registration_date, user_country) VALUES -(1, 'John', 'john@gmail.com', '2020-01-01', 'USA'), -(2, 'Mary', 'mary@gmail.com', '2021-01-01', 'UK'), -(3, 'Bob', 'bob@gmail.com', '2020-01-01', 'USA'), -(4, 'Alice', 'alice@gmail.com', '2021-01-01', 'UK'), -(5, 'Charlie', 'charlie@gmail.com', '2020-01-01', 'USA'), -(6, 'David', 'david@gmail.com', '2021-01-01', 'UK'), -(7, 'Eve', 'eve@gmail.com', '2020-01-01', 'USA'), -(8, 'Frank', 'frank@gmail.com', '2021-01-01', 'UK'), -(9, 'Grace', 'grace@gmail.com', '2020-01-01', 'USA'), -(10, 'Helen', 'helen@gmail.com', '2021-01-01', 'UK'); +INSERT INTO users (user_id, user_name, user_email, registration_date, user_country) VALUES +(1, 'John', 'john@example.com', '2020-01-01', 'USA'), +(2, 'Mary', 'mary@example.com', '2021-01-01', 'UK'), +(3, 'Bob', 'bob@example.com', '2020-01-01', 'USA'), +(4, 'Alice', 'alice@example.com', '2021-01-01', 'UK'), +(5, 'Charlie', 'charlie@example.com', '2020-01-01', 'USA'), +(6, 'David', 'david@example.com', '2021-01-01', 'UK'), +(7, 'Eve', 'eve@example.com', '2020-01-01', 'USA'), +(8, 'Frank', 'frank@example.com', '2021-01-01', 'UK'), +(9, 'Grace', 'grace@example.com', '2020-01-01', 'USA'), +(10, 'Helen', 'helen@example.com', '2021-01-01', 'UK');

🤖 Prompt for AI Agents

In `@docker/examples/sqls/case_2_ecom_sqlite.sql`. around lines 25 - 35, The seed data in the INSERT INTO users statement contains real-looking emails; update the user_email values to use reserved example domains (e.g., user@example.com or user@example.org) to avoid PII. Edit the INSERT INTO users (... user_email ...) VALUES clause and replace each gmail address (e.g., 'john@gmail.com', 'mary@gmail.com', etc.) with non-real addresses (e.g., 'user1@example.com', 'user2@example.com') while keeping the rest of the columns (user_id, user_name, registration_date, user_country) unchanged.

coderabbitai · 2026-01-23T02:06:35Z

docker/examples/sqls/test_case_info_sqlite.sql.

+('学校管理系统', '测试SQL助手的联合查询，条件查询和排序功能', '查询哪些学生在2023年秋季学期的课程学分总和超过15', 'SELECT students.student_name FROM students JOIN scores ON students.student_id = scores.student_id JOIN courses ON scores.course_id = courses.course_id WHERE scores.semester = ''2023年秋季'' GROUP BY students.student_id HAVING SUM(courses.credit) > 15;', '返回在2023年秋季学期的课程学分总和超过15的学生的姓名'),
+('电商系统', '测试SQL助手的数据聚合和分组功能', '查询每个用户的总订单数量', 'SELECT users.user_name, COUNT(orders.order_id) as order_count FROM users JOIN orders ON users.user_id = orders.user_id GROUP BY users.user_id;', '返回每个用户的总订单数量'),
+('电商系统', '测试SQL助手的数据聚合和分组功能', '查询每种商品的总销售额', 'SELECT products.product_name, SUM(products.product_price * orders.quantity) as total_sales FROM products JOIN orders ON products.product_id = orders.product_id GROUP BY products.product_id;', '返回每种商品的总销售额'),
+('电商系统', '测试SQL助手的数据聚合和分组功能', '查询2023年最受欢迎的商品（订单数量最多的商品）', 'SELECT products.product_name FROM products JOIN orders ON products.product_id = orders.product_id WHERE YEAR(orders.order_date) = 2023 GROUP BY products.product_id ORDER BY COUNT(orders.order_id) DESC LIMIT 1;', '返回2023年最受欢迎的商品（订单数量最多的商品）的名称');


⚠️ Potential issue | 🟠 Major

SQLite incompatibility: YEAR() function is not supported.

The expected_sql on this line uses YEAR(orders.order_date), which is a MySQL function and not valid in SQLite. SQLite requires strftime('%Y', orders.order_date) for extracting the year from a date.

Since this file is specifically for SQLite (test_case_info_sqlite.sql), the expected SQL should use SQLite-compatible syntax.

Proposed fix

-('电商系统', '测试SQL助手的数据聚合和分组功能', '查询2023年最受欢迎的商品（订单数量最多的商品）', 'SELECT products.product_name FROM products JOIN orders ON products.product_id = orders.product_id WHERE YEAR(orders.order_date) = 2023 GROUP BY products.product_id ORDER BY COUNT(orders.order_id) DESC LIMIT 1;', '返回2023年最受欢迎的商品（订单数量最多的商品）的名称'); +('电商系统', '测试SQL助手的数据聚合和分组功能', '查询2023年最受欢迎的商品（订单数量最多的商品）', 'SELECT products.product_name FROM products JOIN orders ON products.product_id = orders.product_id WHERE strftime(''%Y'', orders.order_date) = ''2023'' GROUP BY products.product_id ORDER BY COUNT(orders.order_id) DESC LIMIT 1;', '返回2023年最受欢迎的商品（订单数量最多的商品）的名称');

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

('电商系统', '测试SQL助手的数据聚合和分组功能', '查询2023年最受欢迎的商品（订单数量最多的商品）', 'SELECT products.product_name FROM products JOIN orders ON products.product_id = orders.product_id WHERE YEAR(orders.order_date) = 2023 GROUP BY products.product_id ORDER BY COUNT(orders.order_id) DESC LIMIT 1;', '返回2023年最受欢迎的商品（订单数量最多的商品）的名称');

('电商系统', '测试SQL助手的数据聚合和分组功能', '查询2023年最受欢迎的商品（订单数量最多的商品）', 'SELECT products.product_name FROM products JOIN orders ON products.product_id = orders.product_id WHERE strftime(''%Y'', orders.order_date) = ''2023'' GROUP BY products.product_id ORDER BY COUNT(orders.order_id) DESC LIMIT 1;', '返回2023年最受欢迎的商品（订单数量最多的商品）的名称');

🤖 Prompt for AI Agents

In `@docker/examples/sqls/test_case_info_sqlite.sql`. at line 17, The SQL in the VALUES tuple uses MySQL's YEAR(orders.order_date) which SQLite doesn't support; update the SELECT in that tuple to use SQLite's strftime('%Y', orders.order_date) and compare to the string '2023' (e.g. strftime('%Y', orders.order_date) = '2023') so the expected_sql in the tuple (the one selecting products.product_name with JOIN orders ...) is SQLite-compatible.

coderabbitai · 2026-01-23T02:06:35Z

docker/examples/sqls/user_config_mysql.sql.

+USE mysql;
+UPDATE user SET Host='%' WHERE User='root';


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if the file exists and view its full content find . -type f -name "user_config_mysql.sql" -o -path "*/docker/examples/sqls/user_config_mysql.sql"

Repository: khulnasoft/GPT-DB

Length of output: 105

🏁 Script executed:

# Get a more direct check cat docker/examples/sqls/user_config_mysql.sql 2>/dev/null || echo "File not found or error reading"

Repository: khulnasoft/GPT-DB

Length of output: 133

🌐 Web query:

MySQL 8.0 best practices for changing user host, ALTER USER vs UPDATE mysql.user deprecated

💡 Result:

Best practice in MySQL 8.0: change the host with RENAME USER, not ALTER USER, and don’t UPDATE mysql.user

Key point: In MySQL, an “account” is the tuple 'user'@'host'. Changing the host means changing the account name, which you do with RENAME USER (or by creating a new account), not ALTER USER. RENAME USER can rename both the user and host parts, e.g. 'jeffrey'@'localhost' → 'jeffrey'@'127.0.0.1'. [2]

Recommended options

Rename the account (best when you truly want to “move” it):

RENAME USER 'app'@'oldhost' TO 'app'@'newhost';

This keeps the account’s privileges under the new name. Be aware of definer/orphan-object considerations noted in the manual. [2]

Create a second account (best when you need both old and new host to work during a transition):

CREATE USER 'app'@'newhost' IDENTIFIED BY 'new_password'; -- then re-apply privileges: SHOW GRANTS FOR 'app'@'oldhost'; -- run the shown GRANT statements but targeting 'app'@'newhost'

(There isn’t a single “clone user” statement; you reapply grants.)

Why not UPDATE mysql.user?

Directly modifying grant tables with INSERT/UPDATE/DELETE is not recommended; changes won’t affect privilege checking until you reload privileges (for example via FLUSH PRIVILEGES), and it’s generally considered risky compared to account-management statements. [1] Also, some managed platforms explicitly disallow modifying mysql schema grant tables directly and require CREATE USER/GRANT/REVOKE etc. [3]

Summary:

Change host: use RENAME USER (or create a new 'user'@'host'). [2]

Do not change host with ALTER USER (it’s for account properties, not renaming).

Avoid UPDATE mysql.user; use account-management statements instead. [1][3]

Sources: [1] [2] [3]

🌐 Web query:

MySQL Host='%' security implications remote root access

💡 Result:

In MySQL, the Host part of an account controls where that username may authenticate from. Setting root to Host='%' (that is, root@'%') means “root can log in from any host” (the % wildcard matches any host) and it is the least-specific host match. [1][2]

Security implications of root@'%'

Network-reachable superuser: If MySQL is reachable over the network (intentionally or by mistake), you’ve exposed the most powerful account to the internet/LAN, making it a prime target for password spraying/brute force and credential reuse attacks. [1][3]

Full compromise blast radius: Compromising root typically implies full control of data (read/modify/delete), privilege changes (create users, grant rights), and potentially host-level impact depending on enabled features/privileges and server hardening.

Misconfiguration amplification: Even if you “intend” MySQL to be private, real-world failures happen (cloud security group mistake, port-forward, VPN misroute). root@'%' turns those into immediate high-severity incidents. Guidance and tooling commonly flag “root can login remotely” as a security finding. [4]

Recommended safer patterns

Do not allow remote root: Keep root restricted to local connections (for example root@localhost) and administer remotely via SSH/VPN, or a bastion. [3][4]

Use a separate admin account, restricted by source IP/CIDR (instead of %), and grant only what’s needed. MySQL supports IP/CIDR (or netmask) host specifications for accounts. [1][2]

Add network controls: Bind MySQL to a private interface, and firewall/security-group allowlist only trusted admin/application IPs. [3]

Quick check (find remote-capable root accounts):

SELECT user, host FROM mysql.user WHERE user='root';

If you see root@'%' (or any non-local host), treat it as high risk and remove/lock it unless you have a very specific, well-defended need. [1][3]

Sources: MySQL account name/wildcard rules and meaning of % [1], MySQL connection matching and % being “any host”/least-specific [2], remote access hardening guidance incl. disabling remote root [3], MySQL security advisor item “Root Account Can Login Remotely” [4].

Avoid enabling remote root access by default.

Host='%' allows root login from any host. This is a significant security risk and is flagged by security scanners. Even in examples, this pattern should not be normalized. Additionally, direct UPDATE mysql.user is not recommended in modern MySQL; use CREATE USER and GRANT instead.

Keep root restricted to localhost and create a dedicated application user for remote access:

Safer alternative

USE mysql; -UPDATE user SET Host='%' WHERE User='root'; +-- Keep root local; create a dedicated user for remote access +CREATE USER IF NOT EXISTS 'app_user'@'%' IDENTIFIED BY 'change_me'; +GRANT ALL PRIVILEGES ON *.* TO 'app_user'@'%' WITH GRANT OPTION; FLUSH PRIVILEGES;

🤖 Prompt for AI Agents

In `@docker/examples/sqls/user_config_mysql.sql`. around lines 1 - 2, The SQL snippet updates mysql.user to set Host='%' for User='root', which enables remote root access and is insecure; replace this pattern by keeping root bound to localhost (do not change Host='%' for User='root'), remove direct UPDATE mysql.user calls, and instead create a dedicated application user via CREATE USER and GRANT with only the necessary privileges for remote access, then FLUSH PRIVILEGES; locate the statements UPDATE user SET Host='%' WHERE User='root' and replace them with the CREATE USER / GRANT flow and avoid modifying the root account.

coderabbitai · 2026-01-23T02:06:35Z

Dockerfile

+# 设置 VNC 密码
+RUN mkdir -p /root/.vnc && \
+    x11vnc -storepasswd 123456 /root/.vnc/passwd


⚠️ Potential issue | 🔴 Critical

Remove the hardcoded VNC password (security risk).

A fixed password in the image makes every container trivially accessible. Make the password runtime‑configurable (env or secret) and avoid baking it into the image.

🔒 Suggested direction (example)

-# 设置 VNC 密码 -RUN mkdir -p /root/.vnc && \ - x11vnc -storepasswd 123456 /root/.vnc/passwd +ARG VNC_PASSWORD +RUN test -n "$VNC_PASSWORD" +RUN mkdir -p /root/.vnc && \ + x11vnc -storepasswd "$VNC_PASSWORD" /root/.vnc/passwd

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# 设置 VNC 密码

RUN mkdir -p /root/.vnc && \

x11vnc -storepasswd 123456 /root/.vnc/passwd

ARG VNC_PASSWORD

RUN test -n "$VNC_PASSWORD"

RUN mkdir -p /root/.vnc && \

x11vnc -storepasswd "$VNC_PASSWORD" /root/.vnc/passwd

🤖 Prompt for AI Agents

In `@Dockerfile` around lines 24 - 26, The Dockerfile currently hardcodes the VNC password using the RUN line that calls x11vnc -storepasswd 123456 /root/.vnc/passwd; remove that hardcoded invocation and instead create a runtime setup (entrypoint) that reads a VNC password from an environment variable (e.g., VNC_PASSWORD) or a secret file (e.g., /run/secrets/vnc_password), ensures /root/.vnc exists, stores the password with x11vnc -storepasswd using the provided value, and tightens file permissions on /root/.vnc/passwd; update the Dockerfile to copy the entrypoint script and set it as ENTRYPOINT/CMD and keep a safe fallback behavior (error or random password) when no password is provided.

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In `@docs/blog/2025-03-24-gptdb-v0.7.0-release.md`:
- Line 68: Update the broken documentation links in the release blog post
(docs/blog/2025-03-24-gptdb-v0.7.0-release.md) to point to publicly accessible
URLs: either replace the unreachable http://gpt.khulnasoft.com/**...** links
(e.g. /docs/next/quickstart,
/docs/next/config-reference/app/config_chatdashboardconfig_2480d0,
/docs/next/config-reference/utils/config_s3storageconfig_f0cdc9,
/docs/next/installation/docker-build-guide, /docs/next/installation/docker) with
their live equivalents on the currently deployed docs site or temporarily revert
them to the previous working URLs; verify each updated link returns 200 OK
before merging and update all occurrences listed in the comment.
- Line 426: The link labeled "概览" currently points to
https://www.yuque.com/khulnasoft/gptdb-docs/bex30nsv60ru0fmx which appears
inaccessible/404; verify whether the correct public URL for the Chinese
documentation exists and is published, and then update the link target in
docs/blog/2025-03-24-gptdb-v0.7.0-release.md to the working URL (or replace with
an archived URL or mirror); if the doc requires authentication, either remove
the broken link or add a note beside "概览" indicating that authentication is
required and provide instructions or an alternative public resource.

In `@README.md`:
- Line 150: Replace the malformed character in the header string "### � Privacy
and Security" with the intended emoji or plain text (for example "### 🔒 Privacy
and Security" or "### Privacy and Security"), update the README.md header
accordingly, and ensure the file is saved in UTF-8 encoding so the emoji renders
correctly across systems.
- Around line 225-228: The README has a broken link due to a filename typo:
change the link target "DISCKAIMER.md" to the correct "DISCLAIMER.md" in the
Markdown link (the line containing the "Disclaimer" link) so the reference
points to the correct file name.
- Line 158: The header "### � Supported Data Sources" contains a malformed
Unicode character; update that header (the string "### � Supported Data
Sources") to use the intended character (e.g., replace "�" with a proper emoji
like "🔌" or "🔍", or remove it entirely so it reads "### Supported Data
Sources") and save the README.md with UTF-8 encoding to prevent future encoding
issues.

coderabbitai · 2026-01-23T02:11:33Z

docs/blog/2025-03-24-gptdb-v0.7.0-release.md

 **And all Chat scenarios in original GPT-DB now have deep thinking capabilities.**

-For quick usage reference: [http://docs.gptdb.cn/docs/next/quickstart](http://docs.gptdb.cn/docs/next/quickstart)
+For quick usage reference: [http://gpt.khulnasoft.com/docs/next/quickstart](http://gpt.khulnasoft.com/docs/next/quickstart)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's read the file to see all the URL changes cat -n docs/blog/2025-03-24-gptdb-v0.7.0-release.md | sed -n '65,75p;160,170p;225,235p;325,335p;438,445p'

Repository: khulnasoft/GPT-DB

Length of output: 2809

🏁 Script executed:

# Test the accessibility of the documented URLs urls=( "http://gpt.khulnasoft.com/docs/next/quickstart" "http://gpt.khulnasoft.com/docs/next/config-reference/app/config_chatdashboardconfig_2480d0" "http://gpt.khulnasoft.com/docs/next/config-reference/utils/config_s3storageconfig_f0cdc9" "http://gpt.khulnasoft.com/docs/next/installation/docker-build-guide" "http://gpt.khulnasoft.com/docs/next/installation/docker" ) for url in "${urls[@]}"; do http_code=$(curl -s -o /dev/null -w "%{http_code}" -L "$url" 2>&1) echo "$url: HTTP $http_code" done

Repository: khulnasoft/GPT-DB

Length of output: 461

All new documentation URLs are currently inaccessible—HTTP connection failures on all tested links.

The migrated documentation URLs across lines 68, 165, 167, 231, 328, 441–442 all fail to connect (HTTP 000). These include:

http://gpt.khulnasoft.com/docs/next/quickstart

http://gpt.khulnasoft.com/docs/next/config-reference/app/config_chatdashboardconfig_2480d0

http://gpt.khulnasoft.com/docs/next/config-reference/utils/config_s3storageconfig_f0cdc9

http://gpt.khulnasoft.com/docs/next/installation/docker-build-guide

http://gpt.khulnasoft.com/docs/next/installation/docker

Ensure the new documentation domain is fully deployed and publicly accessible before merging, or temporarily revert to working URLs if the new domain is not yet ready.

🤖 Prompt for AI Agents

In `@docs/blog/2025-03-24-gptdb-v0.7.0-release.md` at line 68, Update the broken documentation links in the release blog post (docs/blog/2025-03-24-gptdb-v0.7.0-release.md) to point to publicly accessible URLs: either replace the unreachable http://gpt.khulnasoft.com/**...** links (e.g. /docs/next/quickstart, /docs/next/config-reference/app/config_chatdashboardconfig_2480d0, /docs/next/config-reference/utils/config_s3storageconfig_f0cdc9, /docs/next/installation/docker-build-guide, /docs/next/installation/docker) with their live equivalents on the currently deployed docs site or temporarily revert them to the previous working URLs; verify each updated link returns 200 OK before merging and update all occurrences listed in the comment.

coderabbitai · 2026-01-23T02:11:33Z

docs/blog/2025-03-24-gptdb-v0.7.0-release.md

 **Chinese**

-[概览](https://www.yuque.com/eosphoros/gptdb-docs/bex30nsv60ru0fmx)
+[概览](https://www.yuque.com/khulnasoft/gptdb-docs/bex30nsv60ru0fmx)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check Chinese documentation URL accessibility url="https://www.yuque.com/khulnasoft/gptdb-docs/bex30nsv60ru0fmx" echo "Testing URL: $url" # Use curl with timeout and follow redirects response=$(curl -s -w "\n%{http_code}" -L --connect-timeout 5 --max-time 10 "$url" 2>&1) http_code=$(echo "$response" | tail -n1) body=$(echo "$response" | head -n-1) echo "HTTP Status Code: $http_code" # Check if response is valid if [ "$http_code" -eq 200 ]; then echo "Result: URL is accessible" elif [ "$http_code" -eq 301 ] || [ "$http_code" -eq 302 ] || [ "$http_code" -eq 303 ] || [ "$http_code" -eq 307 ] || [ "$http_code" -eq 308 ]; then echo "Result: URL redirects (Status: $http_code)" elif [ "$http_code" -eq 404 ]; then echo "Result: URL not found (404)" elif [ "$http_code" -eq 403 ]; then echo "Result: Access forbidden (403)" else echo "Result: Unexpected status code ($http_code)" fi

Repository: khulnasoft/GPT-DB

Length of output: 178

🌐 Web query:

Yuque documentation URL https://www.yuque.com/khulnasoft/gptdb-docs/bex30nsv60ru0fmx accessibility

💡 Result:

Direct access from here failed: the specific doc URL timed out when fetched, so its public accessibility can’t be confirmed programmatically. [1]

The parent repo URL https://www.yuque.com/khulnasoft/gptdb-docs returned 404 Not Found, which commonly indicates the knowledge base/repo slug isn’t publicly reachable (deleted/renamed) or is access-restricted. [2]

What to do to verify/access it

Open the URL in a normal browser session. If prompted, sign in to Yuque and request permission from the repo owner.

If you have access but need non-interactive retrieval, use Yuque’s API with a Personal Access Token (common requirement for SDK/API access). [3][4]

Sources: [1] [2] [3] [4]

The Chinese documentation URL appears to be inaccessible or broken.

The parent documentation repository at https://www.yuque.com/khulnasoft/gptdb-docs returns a 404 error, and the specific URL at line 426 times out when accessed. This indicates the documentation may be deleted, renamed, or access-restricted. Verify that the correct URL has been published and is publicly accessible, or confirm whether authentication is required.

🤖 Prompt for AI Agents

In `@docs/blog/2025-03-24-gptdb-v0.7.0-release.md` at line 426, The link labeled "概览" currently points to https://www.yuque.com/khulnasoft/gptdb-docs/bex30nsv60ru0fmx which appears inaccessible/404; verify whether the correct public URL for the Chinese documentation exists and is published, and then update the link target in docs/blog/2025-03-24-gptdb-v0.7.0-release.md to the working URL (or replace with an archived URL or mirror); if the doc requires authentication, either remove the broken link or add a note beside "概览" indicating that authentication is required and provide instructions or an alternative public resource.

coderabbitai · 2026-01-23T02:11:33Z

README.md

+- **API Models:** Wenxin, Tongyi, Zhipu, and other commercial models

- **Fine-tuning Framework**: Model fine-tuning is an indispensable capability for any enterprise to implement in vertical and niche domains. GPT-DB provides a complete fine-tuning framework that integrates seamlessly with the GPT-DB project. In recent fine-tuning efforts, an accuracy rate based on the Spider dataset has been achieved at 82.5%.
+### � Privacy and Security


⚠️ Potential issue | 🔴 Critical

Fix malformed character (encoding issue).

Line 150 contains a malformed character "�" that appears to be a Unicode/emoji rendering issue. This should be replaced with the intended emoji or icon.

🔧 Suggested fix

-### � Privacy and Security +### 🔒 Privacy and Security

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

### � Privacy and Security

### 🔒 Privacy and Security

🤖 Prompt for AI Agents

In `@README.md` at line 150, Replace the malformed character in the header string "### � Privacy and Security" with the intended emoji or plain text (for example "### 🔒 Privacy and Security" or "### Privacy and Security"), update the README.md header accordingly, and ensure the file is saved in UTF-8 encoding so the emoji renders correctly across systems.

coderabbitai · 2026-01-23T02:11:33Z

README.md

+- **Secure Architecture** - End-to-end security framework

- **Data Sources**: Integrating various data sources to seamlessly connect production business data to the core capabilities of GPT-DB.
+### � Supported Data Sources


⚠️ Potential issue | 🔴 Critical

Fix malformed character (encoding issue).

Line 158 contains a malformed character "�" that appears to be a Unicode/emoji rendering issue. This should be replaced with the intended emoji or icon.

🔧 Suggested fix

-### � Supported Data Sources +### 📊 Supported Data Sources

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

### � Supported Data Sources

### 📊 Supported Data Sources

🤖 Prompt for AI Agents

In `@README.md` at line 158, The header "### � Supported Data Sources" contains a malformed Unicode character; update that header (the string "### � Supported Data Sources") to use the intended character (e.g., replace "�" with a proper emoji like "🔌" or "🔍", or remove it entirely so it reads "### Supported Data Sources") and save the README.md with UTF-8 encoding to prevent future encoding issues.

coderabbitai · 2026-01-23T02:11:33Z

README.md

+## ⚠️ Disclaimer
+
+📖 **[Disclaimer](./DISCKAIMER.md)** - Please read our disclaimer for important information about the project.
+


⚠️ Potential issue | 🔴 Critical

Fix typo in disclaimer filename.

Line 227 contains a typo: "DISCKAIMER.md" should be "DISCLAIMER.md". This will result in a broken link.

🔧 Proposed fix

-📖 **[Disclaimer](./DISCKAIMER.md)** - Please read our disclaimer for important information about the project. +📖 **[Disclaimer](./DISCLAIMER.md)** - Please read our disclaimer for important information about the project.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

## ⚠️ Disclaimer

📖 **[Disclaimer](./DISCKAIMER.md)** - Please read our disclaimer for important information about the project.

## ⚠️ Disclaimer

📖 **[Disclaimer](./DISCLAIMER.md)** - Please read our disclaimer for important information about the project.

🤖 Prompt for AI Agents

In `@README.md` around lines 225 - 228, The README has a broken link due to a filename typo: change the link target "DISCKAIMER.md" to the correct "DISCLAIMER.md" in the Markdown link (the line containing the "Disclaimer" link) so the reference points to the correct file name.

- Fix is_exist method to return False on exception instead of True to avoid false positives - Fix string replace assignment in use-chat.ts to properly apply newline replacements - Remove inefficient database cleanup logic in benchmark_data_manager.py - Replace custom JSONPath implementation with jsonpath-ng library for robustness - Pass system_app context to get_benchmark_manager() calls - Refactor Neo4j delete_graph method to use single DETACH DELETE query for better performance - Update HTTP-Referer header URL to use correct khulnasoft-bot repository

FortiShield and others added 2 commits January 23, 2026 02:56

Delete .devcontainer directory

c00a24b

Signed-off-by: fortishield <161459699+FortiShield@users.noreply.github.com>

Fix dev dependency groups and pyproject metadata

c9d3e76

sourcery-ai bot reviewed Jan 23, 2026

View reviewed changes

qodo-code-review bot added the Review effort 4/5 label Jan 23, 2026

feat: Update multiple files

1c02e7c

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

	encrypt_key = "your_secret_key"
	encrypt_key = "${env:ENCRYPT_KEY}"

	encrypt_key = "your_secret_key"
	encrypt_key = "${env:GPTDB_ENCRYPT_KEY}"

	### � Supported Data Sources
	### 📊 Supported Data Sources

		## ⚠️ Disclaimer

		📖 [Disclaimer](./DISCKAIMER.md) - Please read our disclaimer for important information about the project.

Conversation

FortiShield commented Jan 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Description

How Has This Been Tested?

Snapshots:

Checklist:

PR Type

Description

Diagram Walkthrough

File Walkthrough

Summary by CodeRabbit

Release Notes

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

qodo-code-review bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Compliance Guide 🔍

Uh oh!

qodo-code-review bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Firefox (GeckoDriver): what to use now

Python

Java

Important note: --headless=new is Chromium-only

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Best practice in MySQL 8.0: change the host with RENAME USER, not ALTER USER, and don’t UPDATE mysql.user

Recommended options

Why not UPDATE mysql.user?

Security implications of root@'%'

Recommended safer patterns

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

FortiShield commented Jan 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 23, 2026 •

edited

Loading

qodo-code-review bot commented Jan 23, 2026 •

edited

Loading

qodo-code-review bot commented Jan 23, 2026 •

edited

Loading

Important note: `--headless=new` is Chromium-only

Best practice in MySQL 8.0: change the host with `RENAME USER`, not `ALTER USER`, and don’t `UPDATE mysql.user`

Why not `UPDATE mysql.user`?

Security implications of `root@'%'`