Skip to content

Deploy/dev#96

Open
FortiShield wants to merge 4 commits intokhulnasoft:mainfrom
FortiShield:deploy/dev
Open

Deploy/dev#96
FortiShield wants to merge 4 commits intokhulnasoft:mainfrom
FortiShield:deploy/dev

Conversation

@FortiShield
Copy link
Collaborator

@FortiShield FortiShield commented Jan 23, 2026

User description

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Snapshots:

Include snapshots for easier review.

Checklist:

  • My code follows the style guidelines of this project
  • I have already rebased the commits and make the commit message conform to the project standard.
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • Any dependent changes have been merged and published in downstream modules

PR Type

Enhancement, Bug fix, Formatting


Description

  • Agent-based benchmarking support: Added comprehensive agent API evaluation framework with BenchmarkAgentTask, response parsing strategies, and HTTP method configuration

  • Neo4j graph database integration: Implemented Neo4j connector, store adapter, and community store adapter for graph operations and schema introspection

  • Benchmark data loading refactoring: Migrated from CSV to SQLite merge functionality with GitHub Falcon repository support and JSON/JSONL parsing

  • File parsing service enhancements: Added Excel and GitHub Falcon file parsers with standardized result models and Excel output support

  • Naming consistency updates: Renamed DbGpts* classes to GptDbs* and updated repository references from eosphoros to khulnasoft organization

  • Full text search improvements: Enhanced Elasticsearch with metadata filtering and added full text search interface to base storage class

  • Environment variable resolution: Added support for resolving database passwords from environment variables in connector manager

  • API enhancements: Added app_code parameter to chat completion and HTTP request bodies for better app identification

  • Code quality improvements: Comprehensive import statement reorganization across 50+ files to follow PEP 8 style guidelines, improved error logging with traceback information, and null/boolean value normalization in data comparison

  • Documentation updates: Updated documentation URL references and package names in examples


Diagram Walkthrough

flowchart LR
  A["Benchmark Service"] -->|"Agent Config"| B["BenchmarkAgentTask"]
  A -->|"File Type"| C["File Parsers"]
  C -->|"Excel"| D["ExcelFileParseService"]
  C -->|"GitHub"| E["FalconFileParseService"]
  F["Neo4j Connector"] -->|"Graph Ops"| G["Neo4jStore"]
  G -->|"Community"| H["Neo4jStoreAdapter"]
  I["Elasticsearch"] -->|"Full Text"| J["Metadata Filtering"]
  K["Connector Manager"] -->|"Env Vars"| L["Password Resolution"]
  M["API Endpoints"] -->|"app_code"| N["Chat Completion"]
Loading

File Walkthrough

Relevant files
Enhancement
22 files
benchmark_data_manager.py
Refactor benchmark data loading from CSV to SQLite merge 

packages/gptdb-serve/src/gptdb_serve/evaluate/service/fetchdata/benchmark_data_manager.py

  • Refactored CSV import to SQLite merge functionality, replacing CSV
    parsing with SQLite database merging
  • Added new Pydantic models (FileLoadResult, SqlFileItem,
    GoldenSqlListResult) for unified file loading results
  • Implemented load_file_from_github() method supporting JSON/JSONL file
    parsing with multiple encoding support
  • Added _parse_json_file() and _parse_jsonl_file() helper methods for
    flexible JSON data parsing
  • Enhanced GitHub repository download with branch URL support and
    improved error handling
  • Replaced async_after_start() with load_data() method and removed table
    mapping functionality
+520/-328
neo4j_store_adapter.py
Add Neo4j Community Store Adapter implementation                 

packages/gptdb-ext/src/gptdb_ext/storage/knowledge_graph/community/neo4j_store_adapter.py

  • New file implementing Neo4j Community Store Adapter for graph database
    operations
  • Provides methods for community discovery, graph traversal, and CRUD
    operations on entities/relationships
  • Includes Cypher query execution, schema introspection, and graph
    exploration with depth/direction control
  • Implements conversion utilities between Neo4j nodes/relationships and
    internal Vertex/Edge representations
+779/-0 
falcon_file_parse.py
Add Falcon benchmark file parser service                                 

packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/ext/falcon_file_parse.py

  • New file implementing Falcon benchmark file parser for dev/test
    environments
  • Provides dataclass models (BenchmarkDataItem, ColumnItem,
    TableDDLItem, TableDataItem) for benchmark data structures
  • Implements concurrent async file loading with fallback to sync
    execution in existing event loops
  • Includes prompt template building and standard benchmark set parsing
    with SQL schema formatting
+648/-0 
controller.py
Refactor agent chat routing and memory class names             

packages/gptdb-serve/src/gptdb_serve/agent/agents/controller.py

  • Renamed imports from db_gpts_memory to gpt_dbs_memory with updated
    class names
  • Commented out AWEL_LAYOUT flow chat code path and refactored
    agent_chat_v2() method
  • Added is_flow_chat() method to check if app uses flow-based chat
  • Extracted flow chat logic into separate app_agent_flow_chat() method
  • Updated app_agent_chat() to conditionally route between flow and
    agent-based chat
+271/-179
file_parse_service.py
Refactor file parse service with Excel output                       

packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/file_parse_service.py

  • Made parse_input_sets() abstract method require evaluation_env
    parameter
  • Removed @abstractmethod decorator from write_data_compare_result() and
    write_multi_round_benchmark_result()
  • Implemented write_data_compare_result() to write benchmark results to
    Excel with auto-sizing columns
  • Updated parse_standard_benchmark_sets() signature to include
    evaluation_env parameter
  • Removed dependency on BenchmarkResultDao and ExcelUtils
+100/-304
conn_neo4j.py
Add Neo4j datasource connector implementation                       

packages/gptdb-ext/src/gptdb_ext/datasource/conn_neo4j.py

  • New file implementing Neo4j connector with Bolt protocol support
  • Provides Neo4jParameters dataclass for connection configuration (host,
    port, user, password, database)
  • Implements Neo4jConnector class with methods for graph operations,
    schema introspection, and Cypher query execution
  • Includes database management, index retrieval, and streaming query
    support
+344/-0 
connector_manager.py
Add Neo4j connector and environment variable resolution   

packages/gptdb-serve/src/gptdb_serve/datasource/manages/connector_manager.py

  • Added import for Neo4jConnector in on_init() method
  • Added environment variable resolution for database passwords using
    _resolve_env_vars()
  • Improved error handling for JSON parsing with fallback to empty dict
  • Enhanced test_connection() to resolve environment variables in
    password field
+14/-3   
benchmark_service.py
Add agent-based benchmarking support with GitHub Falcon integration

packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/benchmark_service.py

  • Added support for agent-based benchmarking alongside LLM benchmarking
    with new imports for BenchmarkAgentTask and StorageUtil
  • Introduced new parameters for agent API configuration (api_url,
    http_method, headers, parse_strategy, response_mapping,
    evaluation_env, benchmark_type)
  • Changed STANDARD_BENCHMARK_FILE_PATH from local Excel file to GitHub
    Falcon repository URL
  • Added methods to parse HTTP methods, response strategies, and
    evaluation environments with fallback defaults
  • Implemented conditional logic to execute either agent or LLM tasks
    based on benchmark_type configuration
  • Added prompt template formatting with database dialect support
+198/-36
benchmark_agent_task.py
New benchmark agent task module for remote API evaluation

packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/task/benchmark_agent_task.py

  • New file implementing BenchmarkAgentTask class for evaluating remote
    agent APIs
  • Includes ResponseParser class with JSON path and direct parsing
    strategies
  • Supports configurable HTTP methods, headers, authentication, and
    response extraction
  • Implements retry logic with configurable delays and timeout handling
  • Provides async request execution with SSL verification options
+355/-0 
elasticsearch.py
Enhance Elasticsearch full text search with metadata filtering

packages/gptdb-ext/src/gptdb_ext/storage/full_text/elasticsearch.py

  • Changed metadata field type from keyword to object with dynamic: True
    for better queryability
  • Added is_support_full_text_search() and full_text_search() methods to
    support full text search
  • Implemented _build_query() and _build_metadata_filter() methods for
    metadata filtering with Elasticsearch bool queries
  • Added _normalize_metadata() utility to ensure consistent metadata
    handling across dict, string, and JSON formats
  • Updated load_document() and search methods to use normalized metadata
    instead of JSON serialization
+111/-11
models.py
Add agent configuration models and evaluation environment enums

packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/models.py

  • Added new enums: EvaluationEnv (DEV, TEST), BenchmarkInvokeType (LLM,
    AGENT), HttpMethod, ResponseParseStrategy
  • Added GITHUB file parse type to FileParseTypeEnum
  • Introduced AgentApiConfig dataclass with API endpoint, HTTP method,
    headers, response parsing configuration, and retry settings
  • Added AgentCompletionRequest dataclass for agent API requests
  • Extended BenchmarkExecuteConfig with evaluation_env, invoke_type, and
    agent_config fields
+127/-0 
neo4j_store.py
Implement Neo4j graph store with resource registration     

packages/gptdb-ext/src/gptdb_ext/storage/graph_store/neo4j_store.py

  • Added @register_resource decorator with configuration parameters for
    Neo4j host, port, username, password, and database
  • Implemented Neo4jStoreConfig dataclass with connection details and
    graph element type configurations
  • Added Neo4jStore class implementation with connection initialization
    using Neo4jConnector
  • Implemented get_config() and is_exist() methods for configuration
    retrieval and database existence checking
+172/-2 
excel_file_parse.py
New Excel file parsing service for benchmark data extraction

packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/ext/excel_file_parse.py

  • New file implementing ExcelFileParseService for parsing Excel
    benchmark files
  • Includes parse_input_sets() method to extract benchmark data from
    Excel worksheets
  • Implements parse_standard_benchmark_sets() for parsing standard answer
    sets with comparison strategies
  • Added _parse_multi_standard_result() utility for handling multiple
    standard results from JSON strings
+180/-0 
user_input_execute_service.py
Refactor file service to support multiple file type parsers

packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/user_input_execute_service.py

  • Refactored constructor to accept file_type parameter instead of
    file_service instance
  • Added file_service() factory method to instantiate appropriate parser
    based on file type (Excel or GitHub Falcon)
  • Updated read_input_file() to accept evaluation_env parameter and pass
    it to parser
  • Updated post_dispatch() to pass evaluation_env to standard benchmark
    parsing
  • Moved BenchmarkResultDao initialization to class level
+26/-15 
base.py
Add full text search interface to base class                         

packages/gptdb-core/src/gptdb/storage/full_text/base.py

  • Added is_support_full_text_search() method returning True for full
    text stores
  • Added full_text_search() method as default implementation delegating
    to similar_search_with_scores()
  • Updated aload_document() signature to include optional file_id
    parameter for document-level tracking
+34/-1   
schemas.py
Extend benchmark request schema with agent configuration fields

packages/gptdb-serve/src/gptdb_serve/evaluate/api/schemas.py

  • Changed datasets field type from List[dict] to Union[str, List[dict]]
    to support string values
  • Added new fields to BenchmarkServeRequest: evaluation_env,
    benchmark_type, api_url, http_method, headers, parse_strategy,
    response_mapping
+17/-2   
endpoints.py
Update benchmark endpoint to support agent configuration 

packages/gptdb-serve/src/gptdb_serve/evaluate/api/endpoints.py

  • Updated _run_benchmark_task_sync() to pass all parameters as keyword
    arguments
  • Added new parameters to run_dataset_benchmark() call: benchmark_type,
    api_url, http_method, headers, parse_strategy, response_mapping,
    evaluation_env
+15/-9   
factory.py
Add Neo4j store adapter factory support                                   

packages/gptdb-ext/src/gptdb_ext/storage/knowledge_graph/community/factory.py

  • Added import for Neo4jStore and Neo4jStoreAdapter
  • Implemented factory method to create Neo4jStoreAdapter for Neo4jStore
    instances
+7/-0     
api_v1.py
Add app_code parameter to chat completion requests             

packages/gptdb-app/src/gptdb_app/openapi/api_v1/api_v1.py

  • Added app_code parameter to dialogue context in chat completion
    endpoint
  • Passed app_code to both dialogue.select_param and streaming response
    calls
+2/-1     
http_trigger.py
Add app_code field to HTTP request body                                   

packages/gptdb-core/src/gptdb/core/awel/trigger/http_trigger.py

  • Reorganized imports to place third-party imports before local imports
  • Added app_code field to CommonLLMHttpRequestBody for app
    identification
+2/-2     
StorageUtil.py
Add GitHub Falcon file type detection                                       

packages/gptdb-core/src/gptdb/util/benchmarks/StorageUtil.py

  • Added GITHUB_FALCON_PREFIX constant for GitHub Falcon repository URL
  • Added logic to detect and return FileParseTypeEnum.GITHUB for Falcon
    repository URLs
+4/-0     
opensearch.py
Add file_id parameter to async document loading                   

packages/gptdb-ext/src/gptdb_ext/storage/full_text/opensearch.py

  • Added optional file_id parameter to aload_document method signature
  • Updated method to accept file identifier for async document loading
+3/-1     
Formatting
52 files
gptdbs.py
Rename DbGpts classes to GptDbs for consistency                   

packages/gptdb-serve/src/gptdb_serve/agent/agents/gptdbs.py

  • Renamed DbGptsMessage to GptDbsMessage for naming consistency
  • Renamed DbGptsTaskStep to GptDbsTaskStep
  • Renamed DbGptsCompletion to GptDbsCompletion
  • Updated all references and factory methods to use new naming
    convention
+13/-13 
pack.py
Rename tool identifier constant for consistency                   

packages/gptdb-core/src/gptdb/agent/resource/tool/pack.py

  • Renamed DB_GPT_TOOL_IDENTIFIER to GPT_DB_TOOL_IDENTIFIER for
    consistency
  • Updated all references to use new identifier name in
    _is_function_tool() function
+4/-5     
cli_scripts.py
Reorganize gptdbs CLI imports for clarity                               

packages/gptdb-core/src/gptdb/cli/cli_scripts.py

  • Reorganized imports for gptdbs CLI commands with better grouping
  • Separated import statements for add_repo, install, list_all_apps, and
    other commands
+7/-3     
serve_initialization.py
Update naming and simplify imports in serve initialization

packages/gptdb-app/src/gptdb_app/initialization/serve_initialization.py

  • Updated comment from DbGpts Register to GptDbs Register for
    consistency
  • Simplified import statement for PromptServe
+3/-5     
simple_nl_schema_sql_chart_example.py
Fix import order and update package name in example           

examples/awel/simple_nl_schema_sql_chart_example.py

  • Reorganized imports to place third-party imports after local imports
  • Updated pip install command from gpt-db[openai] to db-gpt[openai]
+2/-3     
app_agent_manage.py
Update memory class names for consistency                               

packages/gptdb-serve/src/gptdb_serve/agent/agents/app_agent_manage.py

  • Renamed import from db_gpts_memory to gpt_dbs_memory
  • Updated class references from MetaDbGptsPlansMemory to
    MetaGptDbsPlansMemory
  • Updated class references from MetaDbGptsMessageMemory to
    MetaGptDbsMessageMemory
+3/-3     
react_agent_example.py
Fix import order in React agent example                                   

examples/agents/react_agent_example.py

  • Reorganized imports to place typing_extensions after local imports
+1/-2     
simple_intent_translator.py
Normalize entity naming in intent translator example         

packages/gptdb-core/src/gptdb/rag/transformer/simple_intent_translator.py

  • Changed entity name from GPTDB to gptdb in example prompt for
    consistency
+1/-1     
gpts_app.py
Fix import order in gpts_app module                                           

packages/gptdb-serve/src/gptdb_serve/agent/db/gpts_app.py

  • Reorganized imports to place sqlalchemy imports after local imports
+10/-11 
vllm_llm.py
Fix import order and update environment variable naming   

packages/gptdb-core/src/gptdb/model/llm/llm_out/vllm_llm.py

  • Reorganized imports to place gptdb imports before third-party imports
  • Changed environment variable from DB_GPT_MODEL_BENCHMARK to
    GPT_DB_MODEL_BENCHMARK
+2/-3     
sandbox_code_agent_example.py
Fix import order in sandbox code agent example                     

examples/agents/sandbox_code_agent_example.py

  • Reorganized imports to place gptdb imports before third-party imports
+1/-2     
api_v2.py
Fix import order in API v2 module                                               

packages/gptdb-app/src/gptdb_app/openapi/api_v2.py

  • Reorganized imports to place starlette imports after local imports
+1/-2     
oceanbase_store.py
Fix import order in OceanBase store module                             

packages/gptdb-ext/src/gptdb_ext/storage/vector_store/oceanbase_store.py

  • Reorganized imports to place sqlalchemy imports after local imports
+2/-3     
serve.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/flow/serve.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy import after internal package imports
+1/-2     
i18n_utils.py
Import statement reorganization for code style                     

packages/gptdb-core/src/gptdb/util/i18n_utils.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved gptdb.configs.model_config import before try-except block
+2/-1     
recommend_question.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/agent/app/recommend_question/recommend_question.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy imports after internal package imports
+1/-2     
storage_adapter.py
Import statement reorganization for code style                     

packages/gptdb-core/src/gptdb/storage/chat_history/storage_adapter.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy.orm.Session import after internal package imports
+1/-2     
media_utils.py
Import statement reorganization for code style                     

packages/gptdb-core/src/gptdb/model/utils/media_utils.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved third-party imports after internal package imports
+2/-3     
serve.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/file/serve.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy import after internal package imports
+1/-2     
conv_links.py
Import statement reorganization for code style                     

packages/gptdb-app/src/gptdb_app/openapi/api_v1/links/conv_links.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved gptdb.storage.metadata import before third-party imports
+1/-2     
__init__.py
Import statement reorganization for code style                     

packages/gptdb-core/src/gptdb/model/init.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved AutoLLMClient import before try-except block
+2/-1     
tugraph_store_adapter.py
Import statement reorganization for code style                     

packages/gptdb-ext/src/gptdb_ext/storage/knowledge_graph/community/tugraph_store_adapter.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved packaging.version.Version import after internal package imports
+1/-2     
my_plugin_db.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/agent/hub/db/my_plugin_db.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy imports after internal package imports
+1/-2     
custom_tool_agent_example.py
Import statement reorganization for code style                     

examples/agents/custom_tool_agent_example.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved typing_extensions imports after internal package imports
+1/-2     
models.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/file/models/models.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy imports after internal package imports
+1/-2     
models.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/gptdbs/my/models/models.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy imports after internal package imports
+1/-2     
models.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/model/models/models.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy imports after internal package imports
+1/-2     
serve.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/utils/_template_files/default_serve_template/serve.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy import after internal package imports
+1/-2     
models.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/feedback/models/models.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy imports after internal package imports
+1/-2     
base.py
Import statement reorganization for code style                     

packages/gptdb-core/src/gptdb/storage/knowledge_graph/base.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved pydantic.Field import after internal package imports
+1/-2     
agent_memory.py
Import statement reorganization for code style                     

packages/gptdb-core/src/gptdb/agent/core/memory/agent_memory.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved typing_extensions.TypedDict import after internal package
    imports
+1/-2     
models.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/prompt/models/models.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy imports after internal package imports
+1/-2     
base.py
Import statement reorganization for code style                     

packages/gptdb-core/src/gptdb/util/tracer/base.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved starlette.requests.Request import after internal package imports
+1/-2     
disk_storage.py
Import statement reorganization for code style                     

packages/gptdb-core/src/gptdb/storage/cache/storage/disk/disk_storage.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved rocksdict imports after internal package imports
+1/-2     
models.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/utils/_template_files/default_serve_template/models/models.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy imports after internal package imports
+1/-2     
settings.py
Import statement reorganization for code style                     

packages/gptdb-app/src/gptdb_app/openapi/api_v1/links/settings.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved gptdb.storage.metadata import before third-party imports
+1/-2     
gpts_plans_db.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/agent/db/gpts_plans_db.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy imports after internal package imports
+1/-2     
benchmark_db.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/evaluate/db/benchmark_db.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved gptdb.storage.metadata import before third-party imports
+1/-2     
serve.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/gptdbs/hub/serve.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy import after internal package imports
+1/-2     
ext_http_trigger.py
Import statement reorganization for code style                     

packages/gptdb-core/src/gptdb/core/awel/trigger/ext_http_trigger.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved starlette.requests.Request import after internal package imports
+1/-2     
serve.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/gptdbs/my/serve.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy import after internal package imports
+1/-2     
models.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/agent/chat/models/models.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy imports after internal package imports
+1/-2     
serve.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/agent/chat/serve.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy import after internal package imports
+1/-2     
serve.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/datasource/serve.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy import after internal package imports
+1/-2     
serve.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/evaluate/serve.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy import after internal package imports
+1/-2     
serve.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/feedback/serve.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy import after internal package imports
+1/-2     
models.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/libro/models/models.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy imports after internal package imports
+1/-2     
serve.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/libro/serve.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy import after internal package imports
+1/-2     
serve.py
Import statement reorganization for code style                     

packages/gptdb-serve/src/gptdb_serve/rag/serve.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved sqlalchemy import after internal package imports
+1/-2     
hf_chat_llm.py
Import statement reorganization for code style                     

packages/gptdb-core/src/gptdb/model/llm/llm_out/hf_chat_llm.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved transformers imports after internal package imports
+1/-2     
chat_history_db.py
Import statement reorganization for code style                     

packages/gptdb-core/src/gptdb/storage/chat_history/chat_history_db.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved gptdb.storage.metadata import before third-party imports
+1/-2     
server.py
Import statement reorganization for code style                     

packages/gptdb-core/src/gptdb/util/code/server.py

  • Reorganized import statements to follow PEP 8 style guidelines
  • Moved lyric imports after internal package imports
+1/-2     
Configuration changes
2 files
repo.py
Update default gptdbs repository to khulnasoft organization

packages/gptdb-core/src/gptdb/util/gptdbs/repo.py

  • Changed default repository from eosphoros/gptdbs to khulnasoft/gptdbs
  • Updated repository URL from eosphoros to khulnasoft-bot in GitHub
    links
  • Updated error messages and documentation examples to reflect new
    naming
+3/-3     
model.py
Update plugin hub URL and fix import order                             

packages/gptdb-serve/src/gptdb_serve/agent/model.py

  • Reorganized imports to place pydantic imports after local imports
  • Updated plugin hub URL from khulnasoft/GPT-DB-Plugins.git to
    khulnasoft-bot/GPT-DB-Plugins.git
+2/-3     
Miscellaneous
2 files
demo_selenium.py
Add Selenium WebDriver demo script                                             

demo_selenium.py

  • New file demonstrating Selenium WebDriver usage with Firefox browser
  • Includes headless mode configuration, webpage navigation, and
    screenshot capture
  • Contains Bengali language comments for documentation
+25/-0   
model_adapter.py
Rename environment variable for consistency                           

packages/gptdb-core/src/gptdb/model/adapter/model_adapter.py

  • Changed environment variable name from DB_GPT_MODEL_BENCHMARK to
    GPT_DB_MODEL_BENCHMARK
+1/-1     
Error handling
1 files
service.py
Improve error logging with traceback information                 

packages/gptdb-serve/src/gptdb_serve/rag/service/service.py

  • Added traceback import and error logging in exception handler
  • Enhanced error logging to include full traceback information for
    debugging
+4/-1     
Bug fix
1 files
data_compare_service.py
Add null and boolean value normalization in data comparison

packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/data_compare_service.py

  • Added null string handling to convert null to empty string
  • Added boolean value normalization to convert True/False to 1/0 for
    database consistency
+18/-1   
Documentation
1 files
simple_rag_summary_example.py
Update documentation URL reference                                             

examples/awel/simple_rag_summary_example.py

  • Updated documentation URL from docs.gptdb.cn to gpt.khulnasoft.com
+1/-1     
Additional files
101 files
bug-report.yml +4/-4     
python-publish.yml +3/-3     
.pre-commit-config.yaml +41/-41 
Dockerfile +34/-0   
LICENSE +1/-1     
Makefile +106/-113
README.md +239/-303
build_image.sh +8/-8     
ha-model-cluster.toml. +64/-0   
ha-webserver.toml. +28/-0   
test_case_sqlite_data.py +1/-1     
my.cnf. +44/-0   
case_1_student_manager_mysql.sql. +63/-0   
case_1_student_manager_sqlite.sql. +59/-0   
case_1_student_manager_vertica.sql. +78/-0   
case_2_ecom_mysql.sql. +63/-0   
case_2_ecom_sqlite.sql. +59/-0   
case_2_ecom_vertica.sql. +78/-0   
case_3_order_wide_table_sqlite_wide.sql. +317/-0 
test_case.md. +87/-0   
test_case_info_mysql.sql. +19/-0   
test_case_info_sqlite.sql. +17/-0   
test_case_info_vertica.sql. +29/-0   
user_config_mysql.sql. +3/-0     
2024-07-24-gpt-db-llama-3.1-support.md +1/-1     
2024-09-04-gptdb-v0.6.0-Defining new standards for AI-native data applications.md +2/-2     
2025-03-24-gptdb-v0.7.0-release.md +10/-10 
2025-04-29-gpt-db-qwen3-support.md +1/-1     
benchmark.md +1/-1     
cli.md +1/-1     
text_to_sql.md +2/-2     
Released_V0.5.0.md +5/-5     
Released_V0.6.0.md +2/-2     
data_analysis_app_develop.md +15/-15 
kbqa.md +1/-1     
introduction.md +3/-3     
docker.md +1/-1     
docker_compose.md +3/-3     
aimlapi_llm_install.md +2/-2     
cluster.md +1/-1     
benchmark.md +2/-2     
docusaurus.config.js +6/-6     
auto_search_agent.py +0/-1     
data_manus.py +0/-1     
db_create.py +0/-1     
graph_rag_example.py +0/-1     
chat_data_with_awel.py +0/-1     
graphrag-mini.md +3/-3     
graphrag-test.md +46/-46 
README.md +2/-1     
gptdb_agent.po +65/-0   
gptdb_app__cli.po +22/-0   
gptdb_app_config.po +266/-0 
gptdb_app_gptdb_server.po +23/-0   
gptdb_app_knowledge.po +30/-0   
gptdb_app_operators.po +556/-0 
gptdb_app_scene.po +71/-0   
gptdb_client__cli.po +73/-0   
gptdb_core.po +1043/-0
gptdb_datasource.po +65/-0   
gptdb_ext_datasource.po +278/-0 
gptdb_ext_rag.po +478/-0 
gptdb_ext_storage.po +573/-0 
gptdb_model.po +1008/-0
gptdb_rag.po +454/-0 
gptdb_serve_agent.po +52/-0   
gptdb_serve_conversation.po +56/-0   
gptdb_serve_core.po +68/-0   
gptdb_serve_datasource.po +26/-0   
gptdb_serve_evaluate.po +34/-0   
gptdb_serve_feedback.po +26/-0   
gptdb_serve_file.po +64/-0   
gptdb_serve_flow.po +114/-0 
gptdb_serve_gptdbs.po +34/-0   
gptdb_serve_libro.po +26/-0   
gptdb_serve_model.po +34/-0   
gptdb_serve_prompt.po +34/-0   
gptdb_serve_rag.po +155/-0 
gptdb_storage.po +92/-0   
gptdb_util.po +155/-0 
gptdb_app_config.mo [link]   
gptdb_datasource.mo [link]   
gptdb_ext_datasource.mo [link]   
gptdb_model.mo [link]   
gptdb_serve_flow.mo [link]   
gptdb_serve_gptdbs.mo [link]   
gptdb_util.mo [link]   
gptdb_app_config.mo [link]   
gptdb_datasource.mo [link]   
gptdb_ext_datasource.mo [link]   
gptdb_model.mo [link]   
gptdb_serve_flow.mo [link]   
gptdb_serve_gptdbs.mo [link]   
gptdb_util.mo [link]   
gptdb_app_config.mo [link]   
gptdb_datasource.mo [link]   
gptdb_ext_datasource.mo [link]   
gptdb_model.mo [link]   
gptdb_serve_flow.mo [link]   
gptdb_serve_gptdbs.mo [link]   
Additional files not shown

Summary by CodeRabbit

Release Notes

  • Chores

    • Renamed environment variables for consistency
    • Updated copyright year to 2024
    • Enhanced .gitignore to exclude build artifacts and temporary files
    • Overhauled build system with modernized Make targets
  • New Features

    • Added Docker support for headless graphical environment with VNC/noVNC access
    • Introduced example SQL schemas and test data across multiple database systems
    • Added new demo and test automation scripts
  • Documentation

    • Completely restructured README with improved organization and navigation
    • Updated documentation links and references throughout
    • Added release notes and changelog updates

✏️ Tip: You can customize this high-level summary in your review settings.

FortiShield and others added 2 commits January 23, 2026 02:56
Signed-off-by: fortishield <161459699+FortiShield@users.noreply.github.com>
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, we are unable to review this pull request

The GitHub API does not allow us to fetch diffs exceeding 300 files, and this pull request has 1333

@coderabbitai
Copy link

coderabbitai bot commented Jan 23, 2026

Walkthrough

This PR comprehensively updates the project infrastructure, including environment variable renaming across workflows and build scripts, a complete Makefile overhaul with new build targets and package management, addition of a Dockerfile for VNC/graphical environment, license update, extensive documentation rewrites and domain/link updates, and numerous SQL test scripts for multi-database scenarios.

Changes

Cohort / File(s) Summary
Workflow & Environment Variable Renaming
.github/workflows/python-publish.yml, docker/base/build_image.sh
Environment variable renames from DB_GPT_VERSION to GPT_DB_VERSION and DB_GPT_INSTALL_MODE to GPT_DB_INSTALL_MODE. Affects version export and build mode selection in CI/CD pipeline.
Build Infrastructure Overhaul
Makefile
Major rewrite introducing explicit Python version (3.11), VENV_BIN OS-aware paths, package list (PACKAGES), and extensive new targets: setup, testenv, test, test-%, fmt, fmt-check, build, build-%, docker-%, publish, publish-%, clean, clean-dist, help. Shifts from inline venv management to uv-based dependency and package management.
Container & Environment Configuration
Dockerfile, docker/compose_examples/conf/ha-model-cluster.toml, docker/compose_examples/conf/ha-webserver.toml, docker/examples/my.cnf
New Dockerfile (python:3.11-slim with VNC/noVNC headless graphical setup). Two new TOML configuration files with environment-driven settings for multi-service model cluster and web server. New MySQL configuration template with UTF-8 defaults and tuning comments.
Test Data & SQL Scripts
docker/examples/sqls/case_*_*.sql, docker/examples/sqls/test_case*.sql, docker/examples/sqls/user_config_*.sql
Multiple new SQL scripts (11 files) for MySQL, SQLite, and Vertica establishing schema and seed data: student management (case_1), e-commerce (case_2), wide order table (case_3), and test case metadata tables. All database-agnostic DDL/DML additions.
Project Metadata & Configuration
LICENSE, .gitignore, .pre-commit-config.yaml, demo_selenium.py
License copyright update (2023 → 2024, magic.chen → KhulnaSoft Ltd). Extended .gitignore with frontend/build artifacts and lock files. Re-instated pre-commit config header comment. New demo_selenium.py script for Firefox automation with Selenium (headless=False, navigates to python.org, captures screenshot).
Documentation: README & Blog
README.md, docs/blog/2024-09-04-*.md, docs/blog/2025-03-24-*.md
README restructured from marketing-focused to developer-oriented with modular sections (What is GPT-DB, Core Capabilities, Deployment Options, etc.). Blog post link/domain updates: acknowledgment handle replacements (@khulnasoft@csunny) and yuque/docs URL domain changes (eosphoros → khulnasoft, docs.gptdb.cn → gpt.khulnasoft.com).
Documentation: Technical Guides & Changelog
docs/docs/application/*/, docs/docs/cookbook/*/, docs/docs/changelog/*/, docs/docs/gptdbs/, docs/docs/installation/, docs/docs/modules/, docs/docusaurus.config.js
Widespread documentation updates: repository URL references (eosphoros/khulnasoft to khulnasoft-bot repos, khulnasoft/gptdbs to khulnasoft-bot/gptdbs), domain migrations (docs.gptdb.cn → gpt.khulnasoft.com, eosphoros → khulnasoft in Yuque links), UTM parameter updates (db_gpt → gpt_db), and Docusaurus config resource URL updates. No functional logic changes; link and contributor acknowledgments only.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • init commit #90: Modifies environment variable naming in workflow and build scripts (DB_GPT_VERSION → GPT_DB_VERSION), directly related to versioning infrastructure changes.

Suggested labels

review-effort-4, infrastructure, documentation, configuration

Poem

🐰 A grand refactoring hops our way,
With Makefile targets holding sway,
From eosphoros to Khulnasoft's light,
And VNC containers spinning bright!
SQL schemas dance in harmony,
Building the future, wild and free! 🚀

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is largely incomplete; required template sections (How Has This Been Tested, Snapshots, Checklist items) remain unchecked or unfilled, and no test details or validation evidence is provided. Complete the testing section with specific test steps and results, add snapshots if applicable, and check off completed checklist items to demonstrate thorough validation.
Title check ❓ Inconclusive The title 'Deploy/dev' is vague and generic, providing no meaningful information about the changeset's primary purpose or scope. Use a descriptive title that summarizes the main change, such as 'Add agent benchmarking, Neo4j integration, and refactor file parsing' or a more specific focus area.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Jan 23, 2026

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🔴
Path traversal read

Description: load_file_from_github() joins user-controlled file_name to repo_dir without preventing ..
path segments (e.g., ../../etc/passwd), enabling path traversal that can read files
outside the extracted repository directory if this API is exposed to untrusted input.
benchmark_data_manager.py [391-476]

Referred Code
async def load_file_from_github(
    self, file_name: Optional[str] = None
) -> Optional[FileLoadResult]:
    """Download and read a specified file from a GitHub repository.

    Supported file types: .json / .jsonl
    `file_name` can be a relative path within the repository or a plain filename

    Unified return structure (FileLoadResult):
      - type: "json" | "jsonl"
      - file_path, file_name, encoding
      - rows: List[{line_no:int, data:Any}] where data is parsed JSON object
      - count: total number of rows
      - failed_count: number of failed lines (non-zero for jsonl or malformed json)
      - failures: details for failed lines

    For JSON files:
      - If the file contains a JSON array, each element becomes a Row
      - If the file contains a single JSON object, it becomes one Row
      - The structure is flexible and doesn't depend on specific keys
    """



 ... (clipped 65 lines)
SSRF via download

Description: _download_repo_contents() allows downloading arbitrary .zip URLs when repo_url ends with
.zip, which can be used for SSRF/internal network access and to fetch attacker-controlled
archives for further exploitation.
benchmark_data_manager.py [760-814]

Referred Code
async def _download_repo_contents(self, repo_url: str) -> str:
    """Download repository with caching, supporting branch URLs"""
    cache_path = self._get_cache_path(repo_url)

    # Use cache if valid



 ... (clipped 34 lines)
Untrusted SQL execution

Description: _merge_sqlite_databases() executes create_sql pulled from untrusted downloaded SQLite
files (sqlite_master.sql) via cursor.execute(create_sql), which may allow execution of
unexpected/malicious SQL constructs embedded in the source databases.
benchmark_data_manager.py [868-955]

Referred Code
async def _merge_sqlite_databases(self, sqlite_files: List[str]) -> Dict:
    """Merge multiple SQLite files into the main database"""
    await self.init_connector()
    assert self._connector is not None

    def _worker():
        results = {
            "total_files": len(sqlite_files),
            "successful": 0,
            "failed": 0,
            "tables_merged": [],
        }

        with self._connector.session_scope() as session:
            # 获取底层的 sqlite3 连接对象
            connection_proxy = session.connection()
            # 兼容不同版本的 SQLAlchemy 获取底层连接的方式
            try:
                # SQLAlchemy 1.4+ / 2.0
                raw_conn = connection_proxy.connection.dbapi_connection
            except AttributeError:



 ... (clipped 67 lines)
Cypher injection

Description: Cypher queries interpolate labels/relationship types using f-strings (e.g., MERGE
(src)-[r:{edge_type}]->(dst)), which can enable Cypher injection if edge_type/types are
ever influenced by untrusted input rather than fixed enums.
neo4j_store_adapter.py [183-202]

Referred Code
def upsert_edge(
    self, edges: Iterator[Edge], edge_type: str, src_type: str, dst_type: str
) -> None:
    """Upsert edges."""
    for edge in edges:
        props = {
            "id": edge.name or f"{edge.sid}_{edge.tid}",
            "name": edge.name or edge_type,
            "description": edge.get_prop("description") or "",
            "_chunk_id": edge.get_prop("_chunk_id") or "",
        }

        query = f"""
        MATCH (src:{src_type} {{id: $sid}})
        MATCH (dst:{dst_type} {{id: $tid}})
        MERGE (src)-[r:{edge_type}]->(dst)
        SET r += $props
        """
        self.graph_store.conn.run(query, sid=edge.sid, tid=edge.tid, props=props)
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing audit context: Critical destructive operations (e.g., truncation/deletion) are logged without user
identity/context and without structured outcome details, preventing reliable
reconstruction of actions.

Referred Code
def truncate(self):
    """Truncate Graph."""
    logger.warning("Truncating all data from Neo4j database")

    # Delete all relationships first
    self.graph_store.conn.run("MATCH ()-[r]->() DELETE r")

    # Then delete all nodes
    self.graph_store.conn.run("MATCH (n) DELETE n")

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Path traversal risk: file_name is joined to repo_dir without blocking .. path segments, enabling path traversal
to read files outside the extracted repository directory.

Referred Code
# Handle both relative path and plain filename cases
normalized = str(file_name).strip().lstrip("/").replace("\\", os.sep)
candidate_paths: List[str] = []

# Prefer direct path resolution using the relative path
direct_path = os.path.join(repo_dir, normalized)
if os.path.isfile(direct_path):
    ext = os.path.splitext(direct_path.lower())[1]
    if not requested_ext:
        if ext in allowed_exts:
            candidate_paths.append(direct_path)
    elif ext == requested_ext:
        candidate_paths.append(direct_path)

# If not found, recursively search by filename match
if not candidate_paths:
    target_name = os.path.basename(normalized)
    for root, _, files in os.walk(repo_dir):
        for f in files:
            if f == target_name:
                full = os.path.join(root, f)



 ... (clipped 9 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Generic exception wrapping: Broad exception handling rethrows as RuntimeError while also sometimes continuing after
failures (e.g., silent rollback/detach failures), which may mask partial-merge states and
complicate recovery without clearer error classification and handling strategy.

Referred Code
async def load_file_from_github(
    self, file_name: Optional[str] = None
) -> Optional[FileLoadResult]:
    """Download and read a specified file from a GitHub repository.

    Supported file types: .json / .jsonl
    `file_name` can be a relative path within the repository or a plain filename

    Unified return structure (FileLoadResult):
      - type: "json" | "jsonl"
      - file_path, file_name, encoding
      - rows: List[{line_no:int, data:Any}] where data is parsed JSON object
      - count: total number of rows
      - failed_count: number of failed lines (non-zero for jsonl or malformed json)
      - failures: details for failed lines

    For JSON files:
      - If the file contains a JSON array, each element becomes a Row
      - If the file contains a single JSON object, it becomes one Row
      - The structure is flexible and doesn't depend on specific keys
    """



 ... (clipped 541 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Upstream error body: The raised error includes raw GitHub API response text (text_resp) which could expose
internal details to callers if surfaced beyond internal logs.

Referred Code
headers = {"Accept": "application/vnd.github.v3+json"}
async with self._http_session.get(zip_url, headers=headers) as response:
    if response.status != 200:
        text_resp = await response.text()
        raise RuntimeError(
            f"GitHub API Error {response.status}: {text_resp}"
        )

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Logs full query: Error logs include the full Cypher query string which may embed sensitive
identifiers/content depending on how upstream code constructs queries.

Referred Code
def query(self, query: str, **kwargs) -> MemoryGraph:
    """Execute a Cypher query and return results as a MemoryGraph."""
    graph = MemoryGraph()

    try:
        results = self.graph_store.conn.run(query, **kwargs)

        for record in results:
            # Process each value in the record
            for key, value in record.items():
                if value is None:
                    continue
                if hasattr(value, "labels"):  # It's a node
                    vertex = self._neo4j_node_to_vertex(value)
                    graph.upsert_vertex(vertex)
                elif hasattr(value, "type"):  # It's a relationship
                    edge = self._neo4j_relationship_to_edge(value)
                    graph.append_edge(edge)

    except Exception as e:
        logger.error(f"Query execution failed: {e}\nQuery: {query}")



 ... (clipped 1 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Jan 23, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix missing conversation history saving

Fix a bug in app_agent_chat where conversation history for flow-based chats is
not saved by uncommenting the finally block and initializing agent_task to
prevent an UnboundLocalError.

packages/gptdb-serve/src/gptdb_serve/agent/agents/controller.py [460-568]

 async def app_agent_chat(
     self,
     conv_uid: str,
     gpts_name: str,
     user_query: str,
     user_code: str = None,
     sys_code: str = None,
     enable_verbose: bool = True,
     stream: Optional[bool] = True,
     **ext_info,
 ):
     # logger.info(f"app_agent_chat:{gpts_name},{user_query},{conv_uid}")
+    conv_serve = ConversationServe.get_instance(CFG.SYSTEM_APP)
+    current_message: StorageConversation = _build_conversation(
+        conv_id=conv_uid,
+        select_param=gpts_name,
+        summary=user_query,
+        model_name="",
+        app_code=gpts_name,
+        conv_serve=conv_serve,
+        user_name=user_code,
+    )
+    current_message.save_to_storage()
+    current_message.start_new_round()
+    current_message.add_user_message(user_query)
+    agent_conv_id = None
+    agent_task = None
+    default_final_message = None
 
     if self.is_flow_chat(gpts_name=gpts_name):
         try:
             async for (
                 task,
                 chunk,
                 agent_conv_id,
             ) in multi_agents.app_agent_flow_chat(
                 conv_uid,
                 gpts_name,
                 user_query,
                 user_code,
                 sys_code,
                 enable_verbose=enable_verbose,
                 stream=stream,
                 **ext_info,
             ):
                 agent_task = task
                 default_final_message = chunk
                 yield chunk
 
         except asyncio.CancelledError:
             # Client disconnects
             print("Client disconnected")
             if agent_task:
                 logger.info(f"Chat to App {gpts_name}:{agent_conv_id} Cancel!")
                 agent_task.cancel()
         except Exception as e:
             logger.exception(f"Chat to App {gpts_name} Failed!" + str(e))
             raise
-        # finally:
-        #     logger.info(f"save agent chat info!{conv_uid}")
-        #     if agent_task:
-        #         final_message = await self.stable_message(agent_conv_id)
-        #         if final_message:
-        #             current_message.add_view_message(final_message)
-        #     else:
-        #         default_final_message = default_final_message.replace("data:", "")
-        #         current_message.add_view_message(default_final_message)
+        finally:
+            logger.info(f"save agent chat info!{conv_uid}")
+            if agent_task:
+                final_message = await self.stable_message(agent_conv_id)
+                if final_message:
+                    current_message.add_view_message(final_message)
+            elif default_final_message:
+                default_final_message = default_final_message.replace("data:", "")
+                current_message.add_view_message(default_final_message)
 
-        # current_message.end_current_round()
-        # current_message.save_to_storage()
+            current_message.end_current_round()
+            current_message.save_to_storage()
     else:
         # Temporary compatible scenario messages
-        conv_serve = ConversationServe.get_instance(CFG.SYSTEM_APP)
-        current_message: StorageConversation = _build_conversation(
-            conv_id=conv_uid,
-            select_param=gpts_name,
-            summary=user_query,
-            model_name="",
-            app_code=gpts_name,
-            conv_serve=conv_serve,
-            user_name=user_code,
-        )
-        current_message.save_to_storage()
-        current_message.start_new_round()
-        current_message.add_user_message(user_query)
-        agent_conv_id = None
-        agent_task = None
-        default_final_message = None
         try:
             async for task, chunk, agent_conv_id in multi_agents.agent_chat_v2(
                 conv_uid,
                 current_message.chat_order,
                 gpts_name,
                 user_query,
                 user_code,
                 sys_code,
                 enable_verbose=enable_verbose,
                 stream=stream,
                 **ext_info,
             ):
                 agent_task = task
                 default_final_message = chunk
                 yield chunk
 
         except asyncio.CancelledError:
             # Client disconnects
             print("Client disconnected")
             if agent_task:
                 logger.info(f"Chat to App {gpts_name}:{agent_conv_id} Cancel!")
                 agent_task.cancel()
         except Exception as e:
             logger.exception(f"Chat to App {gpts_name} Failed!" + str(e))
             raise
         finally:
             logger.info(f"save agent chat info!{conv_uid}")
             if agent_task:
                 final_message = await self.stable_message(agent_conv_id)
                 if final_message:
                     current_message.add_view_message(final_message)
-            else:
+            elif default_final_message:
                 default_final_message = default_final_message.replace("data:", "")
                 current_message.add_view_message(default_final_message)
 
             current_message.end_current_round()
             current_message.save_to_storage()

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical bug where conversation history is not saved for flow-based chats due to a commented-out finally block, which would lead to data loss.

High
Fix incorrect async execution from sync

Fix a RuntimeError by changing how the async _async_load_data function is called
from the synchronous _load_data_sync context when an event loop is already
running.

packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/ext/falcon_file_parse.py [542-555]

 try:
-    asyncio.get_running_loop()
+    loop = asyncio.get_running_loop()
+    # The loop is running, so we can't use loop.run_until_complete.
+    # We must run the async code in a separate thread to not block the
+    # existing event loop, and then wait for its result.
     import concurrent.futures
-
-    with concurrent.futures.ThreadPoolExecutor() as executor:
-        future = executor.submit(
-            asyncio.run, self._async_load_data(data_file, table_ddl_file)
-        )
-        return future.result()
+    with concurrent.futures.ThreadPoolExecutor() as pool:
+        return loop.run_in_executor(
+            pool,
+            self._async_load_data,
+            data_file,
+            table_ddl_file
+        ).result()
 except RuntimeError:
+    # No event loop is running, we can create one.
     loop = get_or_create_event_loop()
     return loop.run_until_complete(
         self._async_load_data(data_file, table_ddl_file)
     )

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a RuntimeError that occurs when calling asyncio.run() from a thread that already has a running event loop, which is a critical bug in the async-from-sync execution logic.

Medium
Correctly retrieve raw database connection

Correct the logic for retrieving the raw DBAPI connection from a SQLAlchemy
session to ensure compatibility with modern versions (1.4+) by accessing
connection_proxy.dbapi_connection directly.

packages/gptdb-serve/src/gptdb_serve/evaluate/service/fetchdata/benchmark_data_manager.py [885-894]

 try:
-    # SQLAlchemy 1.4+ / 2.0
-    raw_conn = connection_proxy.connection.dbapi_connection
+    # SQLAlchemy 1.4+
+    raw_conn = connection_proxy.dbapi_connection
 except AttributeError:
     try:
-        # 旧版本或某些驱动
+        # Older SQLAlchemy versions
         raw_conn = connection_proxy.connection
     except AttributeError:
-        # 最后的尝试
+        # Fallback for other cases
         raw_conn = session.get_bind().raw_connection()

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies an improper attribute access pattern for retrieving the raw DBAPI connection and proposes a more robust, version-compatible approach that improves code clarity and correctness.

Medium
Make method async

Add the async keyword to the aload_document method to match the base class's
abstract method signature.

packages/gptdb-ext/src/gptdb_ext/storage/full_text/opensearch.py [23-25]

-def aload_document(
+async def aload_document(
     self, chunks: List[Chunk], file_id: Optional[str] = None
 ) -> List[str]:

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that the method signature should be async to match the base class, which is important for correctness in an asynchronous framework.

Medium
Reassign replaceAll result
Suggestion Impact:The catch block was updated to assign the result of replaceAll back to message, ensuring the newline replacement takes effect.

code diff:

             } catch {
-              message.replaceAll('\\n', '\n');
+              message = message.replaceAll('\\n', '\n');
             }

Fix a bug in the catch block by reassigning the result of message.replaceAll to
correctly apply newline replacements.

web/hooks/use-chat.ts [95-97]

 } catch {
-  message.replaceAll('\\n', '\n');
+  message = message.replaceAll('\\n', '\n');
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: This suggestion correctly identifies a bug where string immutability is overlooked, causing newline characters not to be replaced, which affects message rendering.

Low
Security
Fix Cypher injection vulnerability

Fix a Cypher injection vulnerability by replacing f-string formatting with
parameterized queries when building the match clauses for nodes and
relationships.

packages/gptdb-ext/src/gptdb_ext/datasource/conn_neo4j.py [208-240]

 # Check if label/relationship exists first to avoid warnings
 if table_type == "node":
     # Remove _node suffix if present
     label_name = table_name.replace("_node", "")
     check_result = session.run("CALL db.labels()")
     existing_labels = [record["label"] for record in check_result]
     if label_name not in existing_labels:
         return []
 
     # Get properties for node label
-    query = f"""
-    MATCH (n:`{label_name}`)
+    query = """
+    MATCH (n)
+    WHERE $label_name IN labels(n)
     WITH n LIMIT 100
     UNWIND keys(n) AS key
     RETURN DISTINCT key AS property
     """
+    params = {"label_name": label_name}
 else:
     # Remove _relationship suffix if present
     rel_name = table_name.replace("_relationship", "")
     check_result = session.run("CALL db.relationshipTypes()")
     existing_rels = [record["relationshipType"] for record in check_result]
     if rel_name not in existing_rels:
         return []
 
     # Get properties for relationship type
-    query = f"""
-    MATCH ()-[r:`{rel_name}`]->()
+    query = """
+    MATCH ()-[r]->()
+    WHERE type(r) = $rel_name
     WITH r LIMIT 100
     UNWIND keys(r) AS key
     RETURN DISTINCT key AS property
     """
+    params = {"rel_name": rel_name}
 
-result = session.run(query)
+result = session.run(query, **params)

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a Cypher injection vulnerability and provides a robust, parameterized query that mitigates the risk, which is a critical security improvement.

High
Correct potentially insecure repository URL

Correct the default repository URL from khulnasoft-bot to the official
khulnasoft organization to prevent potential security risks.

packages/gptdb-core/src/gptdb/util/gptdbs/base.py [16-18]

 DEFAULT_REPO_MAP = {
-    "khulnasoft/gptdbs": "https://github.com/khulnasoft-bot/gptdbs.git",
+    "khulnasoft/gptdbs": "https://github.com/khulnasoft/gptdbs.git",
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: This suggestion correctly identifies a potential security risk where a typo in a repository URL could lead to fetching code from an untrusted source, which is a critical issue.

Medium
Correct potentially insecure plugin URL

Correct the default plugin repository URL from khulnasoft-bot to the official
khulnasoft organization to mitigate security risks from untrusted plugins.

packages/gptdb-serve/src/gptdb_serve/agent/hub/model/model.py [30-37]

 class PluginHubParam(BaseModel):
     channel: Optional[str] = Field("git", description="Plugin storage channel")
     url: Optional[str] = Field(
-        "https://github.com/khulnasoft-bot/GPT-DB-Plugins.git",
+        "https://github.com/khulnasoft/GPT-DB-Plugins.git",
         description="Plugin storage url",
     )
     branch: Optional[str] = Field(

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: This suggestion correctly identifies a potential security risk where a typo in a plugin repository URL could lead to fetching and executing untrusted code, which is a critical issue.

Medium
High-level
Refactor the data loading logic

The suggestion proposes abstracting the new benchmark data loading logic, which
is currently tightly coupled to a specific GitHub repository and file formats
(SQLite, JSON/JSONL), behind a more generic, format-agnostic interface. This
would improve modularity and make future extensions easier.

Examples:

packages/gptdb-serve/src/gptdb_serve/evaluate/service/fetchdata/benchmark_data_manager.py [351-661]
packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/ext/falcon_file_parse.py [183-338]

Solution Walkthrough:

Before:

class BenchmarkDataManager:
    async def load_from_github(self, repo_url: str, data_dirs: List[str]):
        # ... discovers and merges .sqlite files specifically
        repo_dir = await self._download_repo_contents(repo_url)
        sqlite_files = self._discover_sqlite_files(repo_dir, data_dirs)
        await self._merge_sqlite_databases(sqlite_files)

    async def load_file_from_github(self, file_name: str):
        # ... logic to find and parse .json or .jsonl files
        if chosen_ext == ".json":
            return await self._parse_json_file(...)
        elif chosen_ext == ".jsonl":
            return await self._parse_jsonl_file(...)

class FalconFileParseService:
    def __init__(self):
        self.benchmark_manager = get_benchmark_manager()

After:

# New abstraction
class IDataSource(ABC):
    @abstractmethod
    async def load_data(self, config): ...

class GitHubFalconSource(IDataSource):
    async def load_data(self, config):
        # Contains logic for downloading repo, finding files,
        # and merging/parsing them.
        ...

class BenchmarkDataManager:
    def __init__(self):
        self._sources = {"github_falcon": GitHubFalconSource()}
    
    async def load_data(self, source_name, config):
        # Delegates to the appropriate source loader
        source = self._sources.get(source_name)
        await source.load_data(config)
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the new data loading logic is complex and tightly coupled to specific sources and formats, and proposing an abstraction layer is a valid architectural improvement.

Medium
General
Use a robust JSONPath library
Suggestion Impact:The commit removed the manual JSONPath parsing (string splitting and dict/list traversal) and replaced it with jsonpath-ng parsing and evaluation (parse/find), returning the first match value and logging warnings on errors. It also added an import of jsonpath-ng.

code diff:

@@ -7,6 +7,7 @@
 from typing import Any, Dict, Optional, Union
 
 import aiohttp
+from jsonpath_ng import parse as jsonpath_parse
 from gptdb_serve.evaluate.service.benchmark.models import (
     AgentApiConfig,
     AgentCompletionRequest,
@@ -38,32 +39,17 @@
         if not json_path:
             return response_data
 
-        # Remove leading $. if present
-        path = json_path.lstrip("$.")
-
-        # Split path by dots and brackets
-        parts = path.replace("[", ".").replace("]", "").split(".")
-
-        current = response_data
-        for part in parts:
-            if not part:
-                continue
-
-            try:
-                if isinstance(current, dict):
-                    current = current.get(part)
-                elif isinstance(current, list):
-                    index = int(part)
-                    current = current[index]
-                else:
-                    return None
-
-                if current is None:
-                    return None
-            except (KeyError, IndexError, ValueError, TypeError):
+        try:
+            jsonpath_expr = jsonpath_parse(json_path)
+            matches = jsonpath_expr.find(response_data)
+            if matches:
+                # Return the value of the first match
+                return matches[0].value
+            else:
                 return None
-
-        return current
+        except Exception as e:
+            logger.warning(f"Error parsing JSONPath '{json_path}': {e}")
+            return None

Replace the fragile, custom JSONPath parsing implementation with the jsonpath-ng
library to ensure robust and correct handling of JSONPath expressions.

packages/gptdb-serve/src/gptdb_serve/evaluate/service/benchmark/task/benchmark_agent_task.py [27-67]

 @staticmethod
 def parse_json_path(response_data: Any, json_path: str) -> Any:
     """Parse response using JSON path expression.
 
     Args:
         response_data: The response data (dict or list)
         json_path: JSON path expression (e.g., "$.data.content")
 
     Returns:
         Extracted value or None if path not found
     """
     if not json_path:
         return response_data
+    
+    try:
+        from jsonpath_ng import parse
+    except ImportError:
+        raise ImportError(
+            "jsonpath-ng is not installed. Please install it with "
+            "`pip install jsonpath-ng`"
+        )
 
-    # Remove leading $. if present
-    path = json_path.lstrip("$.")
+    try:
+        jsonpath_expression = parse(json_path)
+        matches = jsonpath_expression.find(response_data)
+        if matches:
+            return matches[0].value
+        return None
+    except Exception:
+        logger.warning(f"Failed to parse JSON path: {json_path}")
+        return None
 
-    # Split path by dots and brackets
-    parts = path.replace("[", ".").replace("]", "").split(".")
-
-    current = response_data
-    for part in parts:
-        if not part:
-            continue
-
-        try:
-            if isinstance(current, dict):
-                current = current.get(part)
-            elif isinstance(current, list):
-                index = int(part)
-                current = current[index]
-            else:
-                return None
-
-            if current is None:
-                return None
-        except (KeyError, IndexError, ValueError, TypeError):
-            return None
-
-    return current
-

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the manual JSONPath parsing is brittle and replaces it with a robust, standard library, significantly improving the code's correctness and reliability.

Medium
Warn on ambiguous file match

Add a warning log when multiple candidate files are found during a file search
to inform the user which file is being used.

packages/gptdb-serve/src/gptdb_serve/evaluate/service/fetchdata/benchmark_data_manager.py [428-459]

 candidate_paths: List[str] = []
 # Prefer direct path resolution using the relative path
 direct_path = os.path.join(repo_dir, normalized)
 if os.path.isfile(direct_path):
     ext = os.path.splitext(direct_path.lower())[1]
     if not requested_ext:
         if ext in allowed_exts:
             candidate_paths.append(direct_path)
     elif ext == requested_ext:
         candidate_paths.append(direct_path)
 # If not found, recursively search by filename match
 if not candidate_paths:
     target_name = os.path.basename(normalized)
     for root, _, files in os.walk(repo_dir):
         for f in files:
             if f == target_name:
                 full = os.path.join(root, f)
                 ext = os.path.splitext(f.lower())[1]
                 if not requested_ext:
                     if ext in allowed_exts:
                         candidate_paths.append(full)
                 elif ext == requested_ext:
                     candidate_paths.append(full)
 if not candidate_paths:
     raise FileNotFoundError(f"File not found: {file_name}")
+if len(candidate_paths) > 1:
+    logger.warning(f"Multiple files match '{file_name}', using '{candidate_paths[0]}'")
 chosen = sorted(candidate_paths, key=lambda p: (len(p), p))[0]

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 5

__

Why: The suggestion improves user experience by adding a warning when multiple files match a search, preventing silent and potentially incorrect file selection, which enhances debuggability.

Low
  • Update

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docker/examples/dashboard/test_case_sqlite_data.py (1)

118-120: Fix mismatched and non-numeric user_id assignment.

users entries don’t include a numeric id, so user_id = random.choice(users)[0] assigns the name string, and user_name is picked from a different random user, breaking referential integrity. Pick a single user tuple and map fields consistently (e.g., name/email), or store IDs from inserts.

💡 Proposed fix (consistent user selection)
-        user_id = random.choice(users)[0]
-        user_name = random.choice(users)[1]
+        user = random.choice(users)
+        user_id = user[0]   # If you intend to store name as ID, rename column or adjust schema
+        user_name = user[1]

If user_id is meant to reference the autoincrement user.id, capture those IDs at insert time (e.g., cursor.lastrowid) and store them in a separate list instead of using the name tuple.

🤖 Fix all issues with AI agents
In @.github/ISSUE_TEMPLATE/bug-report.yml:
- Around line 87-93: Update the four installation links in
.github/ISSUE_TEMPLATE/bug-report.yml so they point to valid, non-404
documentation pages; specifically replace the URLs used for "Installation From
Source", "Docker Installation", "Docker Compose Installation" and "Cluster
Installation" with the correct db-gpt.readthedocs.io paths (ensure Docker and
Docker Compose are different URLs), verify each URL opens successfully, and
update the corresponding label entries in the same file so the visible link text
and destination match the corrected docs.

In `@demo_selenium.py`:
- Around line 1-4: demo_selenium.py imports selenium (from selenium import
webdriver) but selenium is only declared in
packages/gptdb-sandbox/pyproject.toml; fix by either moving demo_selenium.py
into the packages/gptdb-sandbox/ folder so it lives with the declared
dependency, or add "selenium>=4.0.0" to the root pyproject.toml dependencies so
imports like webdriver and Options resolve when the root package is installed.
- Around line 6-9: Remove the deprecated options.headless assignment (Options
object defaults to non-headless) and, if you ever need headless in future, use
options.add_argument("-headless") to enable it; also move the driver
instantiation and any navigation/screenshot logic that currently runs at module
import into a main guard by wrapping the code that creates Options(), calls
webdriver.Firefox(...), and performs actions inside if __name__ == "__main__":
so imports no longer start the browser automatically.

In `@docker/compose_examples/conf/ha-model-cluster.toml`.:
- Line 6: Replace the hard-coded encrypt_key default ("your_secret_key") with a
required value sourced from an environment variable (e.g. ENCRYPT_KEY) and
update the config loading logic to fail fast if that env var is missing: remove
the predictable fallback for the encrypt_key key, read the key from process/env
in your config loader (or equivalent config parsing function), and throw/exit
with a clear error when ENCRYPT_KEY is not provided so deployments cannot
silently use a weak default.

In `@docker/compose_examples/conf/ha-webserver.toml`.:
- Line 6: The config currently hardcodes encrypt_key = "your_secret_key"; change
this to read the encryption key from an environment variable (e.g. ENCRYPT_KEY)
instead of a literal string, update any loader/initialization code that reads
encrypt_key to validate the env var and fail fast if it's missing, and ensure
the default placeholder is removed from the repo and documented as an env
requirement; reference the encrypt_key setting in the config and the code path
that loads it when making the change.

In `@docker/examples/my.cnf`.:
- Around line 21-37: The config sets
default-authentication-plugin=mysql_native_password which forces the weaker
legacy auth; change this to be opt-in by commenting out or removing the
default-authentication-plugin=mysql_native_password line so the server uses the
stronger default (caching_sha2_password) unless explicitly enabled for legacy
clients; update the configuration that contains the
default-authentication-plugin setting (and any duplicate occurrences) to be
commented out and add a short comment explaining it's opt-in for legacy clients.

In `@docker/examples/sqls/case_1_student_manager_sqlite.sql`.:
- Around line 15-22: The current PRIMARY KEY on the scores table (PRIMARY KEY
(student_id, course_id)) prevents storing multiple semesters for the same
student/course; update the scores table definition (table name: scores, columns:
student_id, course_id, semester) to include semester in the composite primary
key (e.g., PRIMARY KEY (student_id, course_id, semester)) or alternatively add a
surrogate id column and make that the PK while keeping student_id, course_id,
semester as a UNIQUE constraint to allow multiple semester rows without
conflicts.

In `@docker/examples/sqls/case_2_ecom_sqlite.sql`.:
- Around line 25-35: The seed data in the INSERT INTO users statement contains
real-looking emails; update the user_email values to use reserved example
domains (e.g., user@example.com or user@example.org) to avoid PII. Edit the
INSERT INTO users (... user_email ...) VALUES clause and replace each gmail
address (e.g., 'john@gmail.com', 'mary@gmail.com', etc.) with non-real addresses
(e.g., 'user1@example.com', 'user2@example.com') while keeping the rest of the
columns (user_id, user_name, registration_date, user_country) unchanged.

In `@docker/examples/sqls/test_case_info_sqlite.sql`.:
- Line 17: The SQL in the VALUES tuple uses MySQL's YEAR(orders.order_date)
which SQLite doesn't support; update the SELECT in that tuple to use SQLite's
strftime('%Y', orders.order_date) and compare to the string '2023' (e.g.
strftime('%Y', orders.order_date) = '2023') so the expected_sql in the tuple
(the one selecting products.product_name with JOIN orders ...) is
SQLite-compatible.

In `@docker/examples/sqls/user_config_mysql.sql`.:
- Around line 1-2: The SQL snippet updates mysql.user to set Host='%' for
User='root', which enables remote root access and is insecure; replace this
pattern by keeping root bound to localhost (do not change Host='%' for
User='root'), remove direct UPDATE mysql.user calls, and instead create a
dedicated application user via CREATE USER and GRANT with only the necessary
privileges for remote access, then FLUSH PRIVILEGES; locate the statements
UPDATE user SET Host='%' WHERE User='root' and replace them with the CREATE USER
/ GRANT flow and avoid modifying the root account.

In `@Dockerfile`:
- Around line 24-26: The Dockerfile currently hardcodes the VNC password using
the RUN line that calls x11vnc -storepasswd 123456 /root/.vnc/passwd; remove
that hardcoded invocation and instead create a runtime setup (entrypoint) that
reads a VNC password from an environment variable (e.g., VNC_PASSWORD) or a
secret file (e.g., /run/secrets/vnc_password), ensures /root/.vnc exists, stores
the password with x11vnc -storepasswd using the provided value, and tightens
file permissions on /root/.vnc/passwd; update the Dockerfile to copy the
entrypoint script and set it as ENTRYPOINT/CMD and keep a safe fallback behavior
(error or random password) when no password is provided.
🟡 Minor comments (13)
docker/examples/sqls/case_1_student_manager_sqlite.sql.-1-7 (1)

1-7: Enable FK enforcement in SQLite.

SQLite does not enforce foreign keys unless PRAGMA foreign_keys=ON is set. Add it at the top of the script to avoid silent integrity violations.

🔧 Proposed fix
+PRAGMA foreign_keys = ON;
+
 CREATE TABLE students (
     student_id INTEGER PRIMARY KEY,
     student_name VARCHAR(100),
     major VARCHAR(100),
     year_of_enrollment INTEGER,
     student_age INTEGER
 );
docker/examples/sqls/test_case.md.-79-87 (1)

79-87: Database compatibility issue: YEAR() function is MySQL-specific.

The YEAR(orders.order_date) function works in MySQL but not in SQLite or Vertica. Since this PR includes test data for multiple database engines (MySQL, SQLite, Vertica), this query won't work consistently across all targets.

  • SQLite: Use strftime('%Y', orders.order_date) = '2023'
  • Vertica: Use EXTRACT(YEAR FROM orders.order_date) = 2023

Consider either documenting this as MySQL-only or providing engine-specific variants.

docker/examples/sqls/case_3_order_wide_table_sqlite_wide.sql.-1-5 (1)

1-5: Missing PRIMARY KEY constraint.

The order_wide_table lacks a PRIMARY KEY. For an order table, order_id should typically be the primary key to ensure uniqueness and enable efficient lookups.

Suggested fix
 CREATE TABLE order_wide_table (
 
     -- order_base
-    order_id TEXT, -- 订单ID
+    order_id TEXT PRIMARY KEY, -- 订单ID
     order_no TEXT, -- 订单编号
docker/examples/sqls/case_2_ecom_sqlite.sql.-1-7 (1)

1-7: Enable SQLite foreign key enforcement.
SQLite ignores FK constraints unless PRAGMA foreign_keys = ON; is set, which can silently allow invalid orders rows when this dataset is extended.

✅ Proposed fix
+PRAGMA foreign_keys = ON;
+
 CREATE TABLE users (
     user_id INTEGER PRIMARY KEY,
     user_name VARCHAR(100),
     user_email VARCHAR(100),

Also applies to: 15-23

docker/compose_examples/conf/ha-model-cluster.toml.-4-4 (1)

4-4: Clarify the log level configuration.
system.log_level = "INFO" conflicts with [log].level = "DEBUG". Align them or document precedence to avoid unexpected logging behavior.

🛠️ Example alignment
-[log]
-level = "DEBUG"
+[log]
+level = "INFO"

Also applies to: 64-64

docker/compose_examples/conf/ha-model-cluster.toml.-13-21 (1)

13-21: Use consistent environment variable defaults in the TOML file.

This file inconsistently handles environment variables: GPTDB_LANG (line 3) and WORKER_TYPE (line 28) have defaults, but CONTROLLER_ADDR and the MYSQL_* database fields lack defaults. For consistency with other configuration files (e.g., configs/gptdb-proxy-siliconflow-mysql.toml), add safe defaults like:

controller_addr = "${env:CONTROLLER_ADDR:-http://localhost:8000}"
host = "${env:MYSQL_HOST:-localhost}"
port = "${env:MYSQL_PORT:-3306}"
database = "${env:MYSQL_DATABASE:-gptdb}"
user = "${env:MYSQL_USER:-root}"
password = "${env:MYSQL_PASSWORD:-aa123456}"

While the accompanying ha-cluster-docker-compose.yml does provide these values, the TOML files should be self-contained for portability and clarity.

Also applies to: lines 55-61

.pre-commit-config.yaml-1-1 (1)

1-1: Fix line endings to satisfy YAMLlint.

YAMLlint reports CRLF line endings here; converting this file to LF will clear the lint error and avoid CI/pre-commit failures.

docs/docs/changelog/Released_V0.5.0.md-261-265 (1)

261-265: Fix markdownlint MD034: bare URLs in Appendix.

Convert bare URLs to Markdown links to satisfy lint and improve readability.

✅ Suggested update
-- GPT-DB framework: https://github.com/khulnasoft 
-- Text2SQL fine tuning: https://github.com/khulnasoft-bot/GPT-DB-Hub 
-- GPT-DB-Web : https://github.com/khulnasoft-bot/GPT-DB-Web 
-- official English documentation: http://docs.gptdb.site/docs/overview 
-- official Chinese documentation: https://www.yuque.com/khulnasoft/gptdb-docs/bex30nsv60ru0fmx
+- GPT-DB framework: [khulnasoft](https://github.com/khulnasoft)
+- Text2SQL fine tuning: [GPT-DB-Hub](https://github.com/khulnasoft-bot/GPT-DB-Hub)
+- GPT-DB-Web : [GPT-DB-Web](https://github.com/khulnasoft-bot/GPT-DB-Web)
+- official English documentation: [Overview](http://docs.gptdb.site/docs/overview)
+- official Chinese documentation: [概览](https://www.yuque.com/khulnasoft/gptdb-docs/bex30nsv60ru0fmx)
docs/blog/2025-03-24-gptdb-v0.7.0-release.md-432-432 (1)

432-432: Address markdownlint: bare email in contributors list.

The email triggers MD034; format it as a mailto link or remove it.

✅ Suggested update
-**283569391@qq.com, `@15089677014`, `@Aries-ckt`, `@FOkvj`, `@Jant1L`, `@SonglinLyu`, `@TenYearOldJAVA`, `@Weaxs`, `@cinjoseph`, `@csunny`, `@damonqin`, `@dusx1981`, `@fangyinc`, `@geebytes`, `@haawha`, `@utopia2077`, `@vnicers`, `@xuxl2024`, `@yhjun1026`, `@yunfeng1993`, `@yyhhyyyyyy` and tam**
+**[283569391@qq.com](mailto:283569391@qq.com), `@15089677014`, `@Aries-ckt`, `@FOkvj`, `@Jant1L`, `@SonglinLyu`, `@TenYearOldJAVA`, `@Weaxs`, `@cinjoseph`, `@csunny`, `@damonqin`, `@dusx1981`, `@fangyinc`, `@geebytes`, `@haawha`, `@utopia2077`, `@vnicers`, `@xuxl2024`, `@yhjun1026`, `@yunfeng1993`, `@yyhhyyyyyy` and tam**
docs/docs/gptdbs/introduction.md-3-3 (1)

3-3: Update repository links: khulnasoft-bot/gptdbs is not publicly accessible (404).

The repository at https://github.com/khulnasoft-bot/gptdbs returns 404 Not Found. This affects the links on lines 3, 26-27, and 38. If this is an internal or renamed repository, verify the correct public path. Public references to awel-flow-simple-streaming-chat point to eosphoros-ai/dbgpts or eosphoros-ai/DB-GPT instead.

README.md-225-227 (1)

225-227: Fix the disclaimer link filename.

DISCKAIMER.md looks misspelled and will likely 404.

📝 Suggested fix
-📖 **[Disclaimer](./DISCKAIMER.md)** - Please read our disclaimer for important information about the project.
+📖 **[Disclaimer](./DISCLAIMER.md)** - Please read our disclaimer for important information about the project.
README.md-150-160 (1)

150-160: Replace the replacement‑character headings.

The glyphs will render incorrectly; swap to real emoji or plain text.

📝 Suggested fix
-### � Privacy and Security
+### Privacy and Security
@@
-### � Supported Data Sources
+### Supported Data Sources
README.md-74-76 (1)

74-76: Update "Latest Release" section to reflect the current version.

The README claims "Latest Release: V0.7.0", but as of January 2026, the latest published DB-GPT release is v0.7.4 (October 24, 2025). Update the version number and release notes link accordingly.

🧹 Nitpick comments (16)
docker/examples/sqls/case_2_ecom_mysql.sql. (2)

1-2: Prefer utf8mb4 and make DB creation idempotent.

Using MySQL utf8 limits characters, and re-running the script can fail if the DB already exists. Consider utf8mb4 and IF NOT EXISTS for a more robust example.

♻️ Proposed change
-create database case_2_ecom character set utf8;
+create database if not exists case_2_ecom
+  character set utf8mb4
+  collate utf8mb4_0900_ai_ci;
 use case_2_ecom;

12-16: Use DECIMAL for prices to avoid rounding artifacts.

FLOAT can introduce precision errors in money values. DECIMAL(10,2) is the typical choice for currency.

♻️ Proposed change
 CREATE TABLE products (
     product_id INT PRIMARY KEY,
     product_name VARCHAR(100) COMMENT '商品名称',
-    product_price FLOAT COMMENT '商品价格'
+    product_price DECIMAL(10,2) COMMENT '商品价格'
 ) COMMENT '商品信息表';
docker/examples/sqls/test_case_info_sqlite.sql. (1)

14-14: Non-standard GROUP BY usage may cause issues in strict SQL environments.

The expected_sql groups by students.student_id but selects students.student_name, which is not in the GROUP BY clause. While SQLite is permissive and allows this, it's non-standard SQL and could produce unexpected results if a student_id maps to multiple names (though unlikely given it's a primary key).

This is acceptable for test data but worth noting for consistency with stricter SQL modes.

docker/compose_examples/conf/ha-webserver.toml. (3)

13-13: Consider adding a default value or documenting that CONTROLLER_ADDR is required.

Unlike other environment variables in this file (e.g., GPTDB_LANG, EMBEDDING_MODEL_NAME), controller_addr has no fallback default. If CONTROLLER_ADDR is unset, this may cause unclear startup failures.

💡 Option: Add a placeholder default or comment
-controller_addr = "${env:CONTROLLER_ADDR}"
+# Required: Set CONTROLLER_ADDR environment variable
+controller_addr = "${env:CONTROLLER_ADDR}"

Or, if a sensible default exists:

-controller_addr = "${env:CONTROLLER_ADDR}"
+controller_addr = "${env:CONTROLLER_ADDR:-http://controller:8000}"

21-21: Minor formatting inconsistency.

Missing space before = on this line, while other lines use key = value format consistently.

✏️ Suggested fix
-password ="${env:MYSQL_PASSWORD}"
+password = "${env:MYSQL_PASSWORD}"

27-28: Redundant log level configuration.

log_level is already set to "INFO" in the [system] section (line 4). Having it in both places could cause confusion about which takes precedence or lead to inconsistencies if only one is updated.

docker/examples/sqls/test_case.md. (2)

33-45: Minor inconsistency: Missing "SQL:" label.

Other queries have an explicit "SQL:" label before the code block (see lines 13, 25, 55), but Q3 is missing this label for consistency.

Suggested fix
 查询哪些学生在 "2023年春季" 学期的课程学分总和超过2学分

+SQL:
 ```sql
 SELECT students.student_name

57-62: Potential strict SQL mode issue: GROUP BY column mismatch.

The query selects users.user_name but groups by users.user_id. While this works in MySQL with default settings, it may fail in strict SQL mode (ONLY_FULL_GROUP_BY) or in other databases where all non-aggregated SELECT columns must appear in GROUP BY.

The same pattern appears in Q2 (lines 68-73) with products.product_name vs products.product_id.

Suggested fix for stricter SQL compliance
-SELECT users.user_name, COUNT(orders.order_id) as order_count
-FROM users
-JOIN orders ON users.user_id = orders.user_id
-GROUP BY users.user_id;
+SELECT users.user_name, COUNT(orders.order_id) as order_count
+FROM users
+JOIN orders ON users.user_id = orders.user_id
+GROUP BY users.user_id, users.user_name;
demo_selenium.py (2)

15-15: Prefer explicit waits over time.sleep().

time.sleep(5) is fragile—it either waits too long or not long enough. Use WebDriverWait with expected conditions for reliable synchronization.

♻️ Suggested improvement
+from selenium.webdriver.support.ui import WebDriverWait
+from selenium.webdriver.support import expected_conditions as EC
+from selenium.webdriver.common.by import By
...
-    time.sleep(5)  # পৃষ্ঠা লোড হওয়ার জন্য অপেক্ষা করুন
+    # Wait until the page title contains "Python"
+    WebDriverWait(driver, 10).until(EC.title_contains("Python"))

20-22: Hardcoded path assumes root user on Linux.

/root/screenshot.png is not portable and requires root privileges. Consider using a relative path or a configurable location.

♻️ Suggested improvement
-    screenshot_path = "/root/screenshot.png"
+    screenshot_path = "screenshot.png"  # Or use tempfile/pathlib for cross-platform support
docker/examples/sqls/case_1_student_manager_mysql.sql. (1)

1-1: Use utf8mb4 and IF NOT EXISTS for broader Unicode support and repeatable setup.

MySQL's utf8 is a deprecated alias for utf8mb3, which supports only BMP characters (U+0000..U+FFFF) and cannot store emoji or supplementary Unicode characters. Use utf8mb4 (full UTF-8) instead. Additionally, IF NOT EXISTS makes the script reusable without manual deletion on rerun.

♻️ Recommended change
-create database case_1_student_manager character set utf8;
+create database if not exists case_1_student_manager character set utf8mb4;
docs/docs/modules/benchmark.md (1)

81-82: Prefer HTTPS for docs links if supported.
Switching to HTTPS avoids mixed-content warnings and improves security for readers. Please verify the endpoint supports HTTPS and update accordingly.

🔧 Suggested update (if HTTPS is supported)
-  - Method 1: Configure via configuration file. Reference: [ProxyModel Configuration](http://gpt.khulnasoft.com/docs/next/installation/advanced_usage/More_proxyllms)
-  - Method 2: Configure via product page. Reference: [Models](http://gpt.khulnasoft.com/docs/next/application/llms)
+  - Method 1: Configure via configuration file. Reference: [ProxyModel Configuration](https://gpt.khulnasoft.com/docs/next/installation/advanced_usage/More_proxyllms)
+  - Method 2: Configure via product page. Reference: [Models](https://gpt.khulnasoft.com/docs/next/application/llms)
docs/docs/faq/kbqa.md (1)

56-56: Optional: tighten repetitive phrasing.
Consider rewording to avoid repeated “If you want to …” in adjacent sentences for readability.

docs/docs/installation/model_service/cluster.md (1)

21-21: Optional: remove leading $ in code blocks for copy-paste.
This avoids MD014 lint noise and makes commands directly runnable.

Dockerfile (1)

21-23: Avoid masking install failures and clean apt lists.

|| true hides real errors and leaves apt cache in the layer. Consider gating the install behind a build arg and cleaning up.

♻️ Suggested refactor
-# 安装 NVIDIA GPU 工具(如 nvidia-smi)
-RUN apt-get update && apt-get install -y nvidia-utils-525 || true
+ARG INSTALL_NVIDIA_UTILS=false
+RUN if [ "$INSTALL_NVIDIA_UTILS" = "true" ]; then \
+      apt-get update && \
+      apt-get install -y --no-install-recommends nvidia-utils-525 && \
+      rm -rf /var/lib/apt/lists/*; \
+    fi
Makefile (1)

1-33: Declare phony targets to avoid filename collisions.

This prevents a file named like a target (e.g., setup) from blocking the rule.

🧹 Suggested fix
 .DEFAULT_GOAL := help
+.PHONY: setup testenv fmt fmt-check test test-% all-test mypy \
+	build build-% all-build docker-% all-docker publish publish-test publish-% \
+	clean clean-dist help

Comment on lines 87 to 93
[Installation From Source](https://db-gpt.readthedocs.io/en/latest/getting_started/install/deploy/deploy.html)
- label: >
[Docker Installation](https://gpt-db.readthedocs.io/en/latest/getting_started/install/docker/docker.html)
[Docker Installation](https://db-gpt.readthedocs.io/en/latest/getting_started/install/docker/docker.html)
- label: >
[Docker Compose Installation](https://gpt-db.readthedocs.io/en/latest/getting_started/install/docker/docker.html)
[Docker Compose Installation](https://db-gpt.readthedocs.io/en/latest/getting_started/install/docker/docker.html)
- label: >
[Cluster Installation](https://gpt-db.readthedocs.io/en/latest/getting_started/install/llm/cluster/model_cluster.html)
[Cluster Installation](https://db-gpt.readthedocs.io/en/latest/getting_started/install/llm/cluster/model_cluster.html)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's examine the file to confirm the lines and content
echo "=== Checking file content at lines 87-93 ==="
sed -n '85,95p' .github/ISSUE_TEMPLATE/bug-report.yml
echo ""

# Check the URLs for accessibility
echo "=== Checking URL accessibility ==="
echo "1. Installation From Source URL..."
curl -s -o /dev/null -w "HTTP Status: %{http_code}\n" "https://db-gpt.readthedocs.io/en/latest/getting_started/install/deploy/deploy.html"

echo "2. Docker Installation URL..."
curl -s -o /dev/null -w "HTTP Status: %{http_code}\n" "https://db-gpt.readthedocs.io/en/latest/getting_started/install/docker/docker.html"

echo "3. Cluster Installation URL..."
curl -s -o /dev/null -w "HTTP Status: %{http_code}\n" "https://db-gpt.readthedocs.io/en/latest/getting_started/install/llm/cluster/model_cluster.html"

Repository: khulnasoft/GPT-DB

Length of output: 930


All installation documentation links return 404 errors and must be corrected before merging.

The four URLs in the installation options are inaccessible:

  • Installation From Source: 404
  • Docker Installation: 404
  • Docker Compose Installation: 404
  • Cluster Installation: 404

Additionally, Docker Installation and Docker Compose Installation both point to the same URL (db-gpt.readthedocs.io/en/latest/getting_started/install/docker/docker.html), which needs to be corrected. Verify the correct documentation URLs for the db-gpt.readthedocs.io domain before this change can be merged.

🤖 Prompt for AI Agents
In @.github/ISSUE_TEMPLATE/bug-report.yml around lines 87 - 93, Update the four
installation links in .github/ISSUE_TEMPLATE/bug-report.yml so they point to
valid, non-404 documentation pages; specifically replace the URLs used for
"Installation From Source", "Docker Installation", "Docker Compose Installation"
and "Cluster Installation" with the correct db-gpt.readthedocs.io paths (ensure
Docker and Docker Compose are different URLs), verify each URL opens
successfully, and update the corresponding label entries in the same file so the
visible link text and destination match the corrected docs.

Comment on lines +1 to +4
import time

from selenium import webdriver
from selenium.webdriver.firefox.options import Options
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if pyproject.toml exists and search for selenium
echo "=== Checking for pyproject.toml ==="
if [ -f "pyproject.toml" ]; then
  echo "pyproject.toml found. Searching for selenium..."
  grep -i "selenium" pyproject.toml || echo "selenium not found in pyproject.toml"
else
  echo "pyproject.toml not found in repo root"
fi

echo -e "\n=== Checking for requirements files ==="
find . -maxdepth 2 -type f \( -name "requirements*.txt" -o -name "setup.py" -o -name "setup.cfg" \) 2>/dev/null | while read file; do
  echo "Found: $file"
  grep -i "selenium" "$file" || echo "  - selenium not found in $file"
done

echo -e "\n=== Checking if demo_selenium.py exists ==="
if [ -f "demo_selenium.py" ]; then
  echo "demo_selenium.py found. First 10 lines:"
  head -10 demo_selenium.py
else
  echo "demo_selenium.py not found in repo root"
fi

Repository: khulnasoft/GPT-DB

Length of output: 572


🏁 Script executed:

#!/bin/bash
# Read the pyproject.toml to see all dependency declarations
echo "=== Full pyproject.toml content ==="
cat pyproject.toml

Repository: khulnasoft/GPT-DB

Length of output: 2235


🏁 Script executed:

#!/bin/bash
# Find demo_selenium.py location
echo "=== Location of demo_selenium.py ==="
find . -name "demo_selenium.py" -type f

echo -e "\n=== Checking workspace members for pyproject.toml files with selenium ==="
for dir in packages/gptdb-app packages/gptdb-client packages/gptdb-core packages/gptdb-ext packages/gptdb-serve packages/gptdb-sandbox; do
  if [ -d "$dir" ]; then
    if [ -f "$dir/pyproject.toml" ]; then
      echo "Checking $dir/pyproject.toml for selenium..."
      grep -i "selenium" "$dir/pyproject.toml" && echo "  Found in $dir" || echo "  Not found in $dir"
    fi
  fi
done

Repository: khulnasoft/GPT-DB

Length of output: 788


🏁 Script executed:

#!/bin/bash
# Check gptdb-sandbox dependencies
echo "=== gptdb-sandbox/pyproject.toml dependencies section ==="
grep -A 20 "dependencies" packages/gptdb-sandbox/pyproject.toml | head -30

echo -e "\n=== Check if demo_selenium.py imports gptdb packages ==="
grep -E "^from gptdb|^import gptdb" ./demo_selenium.py || echo "No gptdb imports found"

echo -e "\n=== Full demo_selenium.py content ==="
cat ./demo_selenium.py

Repository: khulnasoft/GPT-DB

Length of output: 1594


demo_selenium.py at root level depends on selenium, which is only declared in gptdb-sandbox workspace member.

The selenium>=4.0.0 dependency is declared in packages/gptdb-sandbox/pyproject.toml, not in the root pyproject.toml. Since demo_selenium.py is located at the repository root and imports selenium directly, the dependency will not be available if the root package is installed standalone. Either move this script into packages/gptdb-sandbox/ where the dependency is declared, or add selenium to the root project dependencies.

🤖 Prompt for AI Agents
In `@demo_selenium.py` around lines 1 - 4, demo_selenium.py imports selenium (from
selenium import webdriver) but selenium is only declared in
packages/gptdb-sandbox/pyproject.toml; fix by either moving demo_selenium.py
into the packages/gptdb-sandbox/ folder so it lives with the declared
dependency, or add "selenium>=4.0.0" to the root pyproject.toml dependencies so
imports like webdriver and Options resolve when the root package is installed.

Comment on lines +6 to +9
options = Options()
options.headless = False # অবশ্যই False হতে হবে, যাতে VNC ডেস্কটপে ব্রাউজারের কার্যকলাপ দেখা যায়

driver = webdriver.Firefox(options=options)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd demo_selenium.py

Repository: khulnasoft/GPT-DB

Length of output: 77


🏁 Script executed:

cat -n demo_selenium.py

Repository: khulnasoft/GPT-DB

Length of output: 909


🌐 Web query:

Selenium 4 Firefox headless mode options.headless deprecated

💡 Result:

In Selenium 4.8.0, the convenience API (options.headless = True, options.setHeadless(true), etc.) was deprecated (and then removed in later Selenium versions) in favor of passing the headless mode via browser arguments. (selenium.dev)

Firefox (GeckoDriver): what to use now

Python

from selenium import webdriver
from selenium.webdriver.firefox.options import Options

opts = Options()
opts.add_argument("-headless")   # or "--headless"
driver = webdriver.Firefox(options=opts)

(Using options.headless = True is what triggers the deprecation warning.) (selenium.dev)

Java

FirefoxOptions options = new FirefoxOptions();
options.addArguments("-headless");   // or "--headless"
WebDriver driver = new FirefoxDriver(options);

(stackoverflow.com)

Important note: --headless=new is Chromium-only

--headless=new is for Chromium’s “new headless” mode; Firefox doesn’t use --headless=new. (selenium.dev)

Citations:


Deprecated headless attribute and module-level execution.

  1. options.headless is deprecated in Selenium 4.8.0+. Since the code intends non-headless mode (visible browser), simply remove this line—non-headless is the default. If you explicitly need to set it in future versions, use options.add_argument("-headless") only to enable headless mode.

  2. Module-level code instantiation means the driver, navigation, and screenshot run on import. Wrap the entire execution logic in if __name__ == "__main__":.

🤖 Prompt for AI Agents
In `@demo_selenium.py` around lines 6 - 9, Remove the deprecated options.headless
assignment (Options object defaults to non-headless) and, if you ever need
headless in future, use options.add_argument("-headless") to enable it; also
move the driver instantiation and any navigation/screenshot logic that currently
runs at module import into a main guard by wrapping the code that creates
Options(), calls webdriver.Firefox(...), and performs actions inside if __name__
== "__main__": so imports no longer start the browser automatically.

language = "${env:GPTDB_LANG:-en}"
log_level = "INFO"
api_keys = []
encrypt_key = "your_secret_key"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid hard‑coded encryption key default.
"your_secret_key" is a predictable fallback and could lead to weak encryption in real deployments. Make this required via env (or fail fast).

🔒 Proposed fix
-encrypt_key = "your_secret_key"
+encrypt_key = "${env:ENCRYPT_KEY}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
encrypt_key = "your_secret_key"
encrypt_key = "${env:ENCRYPT_KEY}"
🤖 Prompt for AI Agents
In `@docker/compose_examples/conf/ha-model-cluster.toml`. at line 6, Replace the
hard-coded encrypt_key default ("your_secret_key") with a required value sourced
from an environment variable (e.g. ENCRYPT_KEY) and update the config loading
logic to fail fast if that env var is missing: remove the predictable fallback
for the encrypt_key key, read the key from process/env in your config loader (or
equivalent config parsing function), and throw/exit with a clear error when
ENCRYPT_KEY is not provided so deployments cannot silently use a weak default.

language = "${env:GPTDB_LANG:-en}"
log_level = "INFO"
api_keys = []
encrypt_key = "your_secret_key"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Hardcoded encryption key is a security risk.

The placeholder "your_secret_key" should be replaced with an environment variable reference to avoid accidental deployment with a weak, publicly-known key. Secrets should never be committed to version control.

🔒 Suggested fix
-encrypt_key = "your_secret_key"
+encrypt_key = "${env:GPTDB_ENCRYPT_KEY}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
encrypt_key = "your_secret_key"
encrypt_key = "${env:GPTDB_ENCRYPT_KEY}"
🤖 Prompt for AI Agents
In `@docker/compose_examples/conf/ha-webserver.toml`. at line 6, The config
currently hardcodes encrypt_key = "your_secret_key"; change this to read the
encryption key from an environment variable (e.g. ENCRYPT_KEY) instead of a
literal string, update any loader/initialization code that reads encrypt_key to
validate the env var and fail fast if it's missing, and ensure the default
placeholder is removed from the repo and documented as an env requirement;
reference the encrypt_key setting in the config and the code path that loads it
when making the change.

Comment on lines +15 to +22
CREATE TABLE scores (
student_id INTEGER,
course_id INTEGER,
score INTEGER,
semester VARCHAR(50),
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Primary key blocks multi-semester scores for the same student/course.

With semester present, (student_id, course_id) as the PK prevents storing multiple semesters for a repeated course. Consider including semester in the PK (or adding a surrogate key) to avoid insert conflicts later.

🐛 Proposed fix (include semester in the PK)
 CREATE TABLE scores (
     student_id INTEGER,
     course_id INTEGER,
     score INTEGER,
     semester VARCHAR(50),
-    PRIMARY KEY (student_id, course_id),
+    PRIMARY KEY (student_id, course_id, semester),
     FOREIGN KEY (student_id) REFERENCES students(student_id),
     FOREIGN KEY (course_id) REFERENCES courses(course_id)
 );
🤖 Prompt for AI Agents
In `@docker/examples/sqls/case_1_student_manager_sqlite.sql`. around lines 15 -
22, The current PRIMARY KEY on the scores table (PRIMARY KEY (student_id,
course_id)) prevents storing multiple semesters for the same student/course;
update the scores table definition (table name: scores, columns: student_id,
course_id, semester) to include semester in the composite primary key (e.g.,
PRIMARY KEY (student_id, course_id, semester)) or alternatively add a surrogate
id column and make that the PK while keeping student_id, course_id, semester as
a UNIQUE constraint to allow multiple semester rows without conflicts.

Comment on lines +25 to +35
INSERT INTO users (user_id, user_name, user_email, registration_date, user_country) VALUES
(1, 'John', 'john@gmail.com', '2020-01-01', 'USA'),
(2, 'Mary', 'mary@gmail.com', '2021-01-01', 'UK'),
(3, 'Bob', 'bob@gmail.com', '2020-01-01', 'USA'),
(4, 'Alice', 'alice@gmail.com', '2021-01-01', 'UK'),
(5, 'Charlie', 'charlie@gmail.com', '2020-01-01', 'USA'),
(6, 'David', 'david@gmail.com', '2021-01-01', 'UK'),
(7, 'Eve', 'eve@gmail.com', '2020-01-01', 'USA'),
(8, 'Frank', 'frank@gmail.com', '2021-01-01', 'UK'),
(9, 'Grace', 'grace@gmail.com', '2020-01-01', 'USA'),
(10, 'Helen', 'helen@gmail.com', '2021-01-01', 'UK');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid real-looking PII in seed emails.
These Gmail-style addresses can map to real people; prefer reserved example domains to eliminate privacy/compliance risk.

✅ Proposed fix
-INSERT INTO users (user_id, user_name, user_email, registration_date, user_country) VALUES
-(1, 'John', 'john@gmail.com', '2020-01-01', 'USA'),
-(2, 'Mary', 'mary@gmail.com', '2021-01-01', 'UK'),
-(3, 'Bob', 'bob@gmail.com', '2020-01-01', 'USA'),
-(4, 'Alice', 'alice@gmail.com', '2021-01-01', 'UK'),
-(5, 'Charlie', 'charlie@gmail.com', '2020-01-01', 'USA'),
-(6, 'David', 'david@gmail.com', '2021-01-01', 'UK'),
-(7, 'Eve', 'eve@gmail.com', '2020-01-01', 'USA'),
-(8, 'Frank', 'frank@gmail.com', '2021-01-01', 'UK'),
-(9, 'Grace', 'grace@gmail.com', '2020-01-01', 'USA'),
-(10, 'Helen', 'helen@gmail.com', '2021-01-01', 'UK');
+INSERT INTO users (user_id, user_name, user_email, registration_date, user_country) VALUES
+(1, 'John', 'john@example.com', '2020-01-01', 'USA'),
+(2, 'Mary', 'mary@example.com', '2021-01-01', 'UK'),
+(3, 'Bob', 'bob@example.com', '2020-01-01', 'USA'),
+(4, 'Alice', 'alice@example.com', '2021-01-01', 'UK'),
+(5, 'Charlie', 'charlie@example.com', '2020-01-01', 'USA'),
+(6, 'David', 'david@example.com', '2021-01-01', 'UK'),
+(7, 'Eve', 'eve@example.com', '2020-01-01', 'USA'),
+(8, 'Frank', 'frank@example.com', '2021-01-01', 'UK'),
+(9, 'Grace', 'grace@example.com', '2020-01-01', 'USA'),
+(10, 'Helen', 'helen@example.com', '2021-01-01', 'UK');
🤖 Prompt for AI Agents
In `@docker/examples/sqls/case_2_ecom_sqlite.sql`. around lines 25 - 35, The seed
data in the INSERT INTO users statement contains real-looking emails; update the
user_email values to use reserved example domains (e.g., user@example.com or
user@example.org) to avoid PII. Edit the INSERT INTO users (... user_email ...)
VALUES clause and replace each gmail address (e.g., 'john@gmail.com',
'mary@gmail.com', etc.) with non-real addresses (e.g., 'user1@example.com',
'user2@example.com') while keeping the rest of the columns (user_id, user_name,
registration_date, user_country) unchanged.

('学校管理系统', '测试SQL助手的联合查询,条件查询和排序功能', '查询哪些学生在2023年秋季学期的课程学分总和超过15', 'SELECT students.student_name FROM students JOIN scores ON students.student_id = scores.student_id JOIN courses ON scores.course_id = courses.course_id WHERE scores.semester = ''2023年秋季'' GROUP BY students.student_id HAVING SUM(courses.credit) > 15;', '返回在2023年秋季学期的课程学分总和超过15的学生的姓名'),
('电商系统', '测试SQL助手的数据聚合和分组功能', '查询每个用户的总订单数量', 'SELECT users.user_name, COUNT(orders.order_id) as order_count FROM users JOIN orders ON users.user_id = orders.user_id GROUP BY users.user_id;', '返回每个用户的总订单数量'),
('电商系统', '测试SQL助手的数据聚合和分组功能', '查询每种商品的总销售额', 'SELECT products.product_name, SUM(products.product_price * orders.quantity) as total_sales FROM products JOIN orders ON products.product_id = orders.product_id GROUP BY products.product_id;', '返回每种商品的总销售额'),
('电商系统', '测试SQL助手的数据聚合和分组功能', '查询2023年最受欢迎的商品(订单数量最多的商品)', 'SELECT products.product_name FROM products JOIN orders ON products.product_id = orders.product_id WHERE YEAR(orders.order_date) = 2023 GROUP BY products.product_id ORDER BY COUNT(orders.order_id) DESC LIMIT 1;', '返回2023年最受欢迎的商品(订单数量最多的商品)的名称');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

SQLite incompatibility: YEAR() function is not supported.

The expected_sql on this line uses YEAR(orders.order_date), which is a MySQL function and not valid in SQLite. SQLite requires strftime('%Y', orders.order_date) for extracting the year from a date.

Since this file is specifically for SQLite (test_case_info_sqlite.sql), the expected SQL should use SQLite-compatible syntax.

Proposed fix
-('电商系统', '测试SQL助手的数据聚合和分组功能', '查询2023年最受欢迎的商品(订单数量最多的商品)', 'SELECT products.product_name FROM products JOIN orders ON products.product_id = orders.product_id WHERE YEAR(orders.order_date) = 2023 GROUP BY products.product_id ORDER BY COUNT(orders.order_id) DESC LIMIT 1;', '返回2023年最受欢迎的商品(订单数量最多的商品)的名称');
+('电商系统', '测试SQL助手的数据聚合和分组功能', '查询2023年最受欢迎的商品(订单数量最多的商品)', 'SELECT products.product_name FROM products JOIN orders ON products.product_id = orders.product_id WHERE strftime(''%Y'', orders.order_date) = ''2023'' GROUP BY products.product_id ORDER BY COUNT(orders.order_id) DESC LIMIT 1;', '返回2023年最受欢迎的商品(订单数量最多的商品)的名称');
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
('电商系统', '测试SQL助手的数据聚合和分组功能', '查询2023年最受欢迎的商品(订单数量最多的商品)', 'SELECT products.product_name FROM products JOIN orders ON products.product_id = orders.product_id WHERE YEAR(orders.order_date) = 2023 GROUP BY products.product_id ORDER BY COUNT(orders.order_id) DESC LIMIT 1;', '返回2023年最受欢迎的商品(订单数量最多的商品)的名称');
('电商系统', '测试SQL助手的数据聚合和分组功能', '查询2023年最受欢迎的商品(订单数量最多的商品)', 'SELECT products.product_name FROM products JOIN orders ON products.product_id = orders.product_id WHERE strftime(''%Y'', orders.order_date) = ''2023'' GROUP BY products.product_id ORDER BY COUNT(orders.order_id) DESC LIMIT 1;', '返回2023年最受欢迎的商品(订单数量最多的商品)的名称');
🤖 Prompt for AI Agents
In `@docker/examples/sqls/test_case_info_sqlite.sql`. at line 17, The SQL in the
VALUES tuple uses MySQL's YEAR(orders.order_date) which SQLite doesn't support;
update the SELECT in that tuple to use SQLite's strftime('%Y',
orders.order_date) and compare to the string '2023' (e.g. strftime('%Y',
orders.order_date) = '2023') so the expected_sql in the tuple (the one selecting
products.product_name with JOIN orders ...) is SQLite-compatible.

Comment on lines +1 to +2
USE mysql;
UPDATE user SET Host='%' WHERE User='root';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if the file exists and view its full content
find . -type f -name "user_config_mysql.sql" -o -path "*/docker/examples/sqls/user_config_mysql.sql"

Repository: khulnasoft/GPT-DB

Length of output: 105


🏁 Script executed:

# Get a more direct check
cat docker/examples/sqls/user_config_mysql.sql 2>/dev/null || echo "File not found or error reading"

Repository: khulnasoft/GPT-DB

Length of output: 133


🌐 Web query:

MySQL 8.0 best practices for changing user host, ALTER USER vs UPDATE mysql.user deprecated

💡 Result:

Best practice in MySQL 8.0: change the host with RENAME USER, not ALTER USER, and don’t UPDATE mysql.user

Key point: In MySQL, an “account” is the tuple 'user'@'host'. Changing the host means changing the account name, which you do with RENAME USER (or by creating a new account), not ALTER USER. RENAME USER can rename both the user and host parts, e.g. 'jeffrey'@'localhost''jeffrey'@'127.0.0.1'. [2]

Recommended options

  1. Rename the account (best when you truly want to “move” it):
RENAME USER 'app'@'oldhost' TO 'app'@'newhost';

This keeps the account’s privileges under the new name. Be aware of definer/orphan-object considerations noted in the manual. [2]

  1. Create a second account (best when you need both old and new host to work during a transition):
CREATE USER 'app'@'newhost' IDENTIFIED BY 'new_password';
-- then re-apply privileges:
SHOW GRANTS FOR 'app'@'oldhost';
-- run the shown GRANT statements but targeting 'app'@'newhost'

(There isn’t a single “clone user” statement; you reapply grants.)

Why not UPDATE mysql.user?

Directly modifying grant tables with INSERT/UPDATE/DELETE is not recommended; changes won’t affect privilege checking until you reload privileges (for example via FLUSH PRIVILEGES), and it’s generally considered risky compared to account-management statements. [1] Also, some managed platforms explicitly disallow modifying mysql schema grant tables directly and require CREATE USER/GRANT/REVOKE etc. [3]

Summary:

  • Change host: use RENAME USER (or create a new 'user'@'host'). [2]
  • Do not change host with ALTER USER (it’s for account properties, not renaming).
  • Avoid UPDATE mysql.user; use account-management statements instead. [1][3]

Sources: [1] [2] [3]


🌐 Web query:

MySQL Host='%' security implications remote root access

💡 Result:

In MySQL, the Host part of an account controls where that username may authenticate from. Setting root to Host='%' (that is, root@'%') means “root can log in from any host” (the % wildcard matches any host) and it is the least-specific host match. [1][2]

Security implications of root@'%'

  • Network-reachable superuser: If MySQL is reachable over the network (intentionally or by mistake), you’ve exposed the most powerful account to the internet/LAN, making it a prime target for password spraying/brute force and credential reuse attacks. [1][3]
  • Full compromise blast radius: Compromising root typically implies full control of data (read/modify/delete), privilege changes (create users, grant rights), and potentially host-level impact depending on enabled features/privileges and server hardening.
  • Misconfiguration amplification: Even if you “intend” MySQL to be private, real-world failures happen (cloud security group mistake, port-forward, VPN misroute). root@'%' turns those into immediate high-severity incidents. Guidance and tooling commonly flag “root can login remotely” as a security finding. [4]

Recommended safer patterns

  • Do not allow remote root: Keep root restricted to local connections (for example root@localhost) and administer remotely via SSH/VPN, or a bastion. [3][4]
  • Use a separate admin account, restricted by source IP/CIDR (instead of %), and grant only what’s needed. MySQL supports IP/CIDR (or netmask) host specifications for accounts. [1][2]
  • Add network controls: Bind MySQL to a private interface, and firewall/security-group allowlist only trusted admin/application IPs. [3]

Quick check (find remote-capable root accounts):

SELECT user, host FROM mysql.user WHERE user='root';

If you see root@'%' (or any non-local host), treat it as high risk and remove/lock it unless you have a very specific, well-defended need. [1][3]

Sources: MySQL account name/wildcard rules and meaning of % [1], MySQL connection matching and % being “any host”/least-specific [2], remote access hardening guidance incl. disabling remote root [3], MySQL security advisor item “Root Account Can Login Remotely” [4].


Avoid enabling remote root access by default.

Host='%' allows root login from any host. This is a significant security risk and is flagged by security scanners. Even in examples, this pattern should not be normalized. Additionally, direct UPDATE mysql.user is not recommended in modern MySQL; use CREATE USER and GRANT instead.

Keep root restricted to localhost and create a dedicated application user for remote access:

Safer alternative
 USE mysql;
-UPDATE user SET Host='%' WHERE User='root';
+-- Keep root local; create a dedicated user for remote access
+CREATE USER IF NOT EXISTS 'app_user'@'%' IDENTIFIED BY 'change_me';
+GRANT ALL PRIVILEGES ON *.* TO 'app_user'@'%' WITH GRANT OPTION;
 FLUSH PRIVILEGES;
🤖 Prompt for AI Agents
In `@docker/examples/sqls/user_config_mysql.sql`. around lines 1 - 2, The SQL
snippet updates mysql.user to set Host='%' for User='root', which enables remote
root access and is insecure; replace this pattern by keeping root bound to
localhost (do not change Host='%' for User='root'), remove direct UPDATE
mysql.user calls, and instead create a dedicated application user via CREATE
USER and GRANT with only the necessary privileges for remote access, then FLUSH
PRIVILEGES; locate the statements UPDATE user SET Host='%' WHERE User='root' and
replace them with the CREATE USER / GRANT flow and avoid modifying the root
account.

Comment on lines +24 to +26
# 设置 VNC 密码
RUN mkdir -p /root/.vnc && \
x11vnc -storepasswd 123456 /root/.vnc/passwd
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove the hardcoded VNC password (security risk).

A fixed password in the image makes every container trivially accessible. Make the password runtime‑configurable (env or secret) and avoid baking it into the image.

🔒 Suggested direction (example)
-# 设置 VNC 密码
-RUN mkdir -p /root/.vnc && \
-    x11vnc -storepasswd 123456 /root/.vnc/passwd
+ARG VNC_PASSWORD
+RUN test -n "$VNC_PASSWORD"
+RUN mkdir -p /root/.vnc && \
+    x11vnc -storepasswd "$VNC_PASSWORD" /root/.vnc/passwd
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# 设置 VNC 密码
RUN mkdir -p /root/.vnc && \
x11vnc -storepasswd 123456 /root/.vnc/passwd
ARG VNC_PASSWORD
RUN test -n "$VNC_PASSWORD"
RUN mkdir -p /root/.vnc && \
x11vnc -storepasswd "$VNC_PASSWORD" /root/.vnc/passwd
🤖 Prompt for AI Agents
In `@Dockerfile` around lines 24 - 26, The Dockerfile currently hardcodes the VNC
password using the RUN line that calls x11vnc -storepasswd 123456
/root/.vnc/passwd; remove that hardcoded invocation and instead create a runtime
setup (entrypoint) that reads a VNC password from an environment variable (e.g.,
VNC_PASSWORD) or a secret file (e.g., /run/secrets/vnc_password), ensures
/root/.vnc exists, stores the password with x11vnc -storepasswd using the
provided value, and tightens file permissions on /root/.vnc/passwd; update the
Dockerfile to copy the entrypoint script and set it as ENTRYPOINT/CMD and keep a
safe fallback behavior (error or random password) when no password is provided.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@docs/blog/2025-03-24-gptdb-v0.7.0-release.md`:
- Line 68: Update the broken documentation links in the release blog post
(docs/blog/2025-03-24-gptdb-v0.7.0-release.md) to point to publicly accessible
URLs: either replace the unreachable http://gpt.khulnasoft.com/**...** links
(e.g. /docs/next/quickstart,
/docs/next/config-reference/app/config_chatdashboardconfig_2480d0,
/docs/next/config-reference/utils/config_s3storageconfig_f0cdc9,
/docs/next/installation/docker-build-guide, /docs/next/installation/docker) with
their live equivalents on the currently deployed docs site or temporarily revert
them to the previous working URLs; verify each updated link returns 200 OK
before merging and update all occurrences listed in the comment.
- Line 426: The link labeled "概览" currently points to
https://www.yuque.com/khulnasoft/gptdb-docs/bex30nsv60ru0fmx which appears
inaccessible/404; verify whether the correct public URL for the Chinese
documentation exists and is published, and then update the link target in
docs/blog/2025-03-24-gptdb-v0.7.0-release.md to the working URL (or replace with
an archived URL or mirror); if the doc requires authentication, either remove
the broken link or add a note beside "概览" indicating that authentication is
required and provide instructions or an alternative public resource.

In `@README.md`:
- Line 150: Replace the malformed character in the header string "### � Privacy
and Security" with the intended emoji or plain text (for example "### 🔒 Privacy
and Security" or "### Privacy and Security"), update the README.md header
accordingly, and ensure the file is saved in UTF-8 encoding so the emoji renders
correctly across systems.
- Around line 225-228: The README has a broken link due to a filename typo:
change the link target "DISCKAIMER.md" to the correct "DISCLAIMER.md" in the
Markdown link (the line containing the "Disclaimer" link) so the reference
points to the correct file name.
- Line 158: The header "### � Supported Data Sources" contains a malformed
Unicode character; update that header (the string "### � Supported Data
Sources") to use the intended character (e.g., replace "�" with a proper emoji
like "🔌" or "🔍", or remove it entirely so it reads "### Supported Data
Sources") and save the README.md with UTF-8 encoding to prevent future encoding
issues.

**And all Chat scenarios in original GPT-DB now have deep thinking capabilities.**

For quick usage reference: [http://docs.gptdb.cn/docs/next/quickstart](http://docs.gptdb.cn/docs/next/quickstart)
For quick usage reference: [http://gpt.khulnasoft.com/docs/next/quickstart](http://gpt.khulnasoft.com/docs/next/quickstart)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's read the file to see all the URL changes
cat -n docs/blog/2025-03-24-gptdb-v0.7.0-release.md | sed -n '65,75p;160,170p;225,235p;325,335p;438,445p'

Repository: khulnasoft/GPT-DB

Length of output: 2809


🏁 Script executed:

# Test the accessibility of the documented URLs
urls=(
  "http://gpt.khulnasoft.com/docs/next/quickstart"
  "http://gpt.khulnasoft.com/docs/next/config-reference/app/config_chatdashboardconfig_2480d0"
  "http://gpt.khulnasoft.com/docs/next/config-reference/utils/config_s3storageconfig_f0cdc9"
  "http://gpt.khulnasoft.com/docs/next/installation/docker-build-guide"
  "http://gpt.khulnasoft.com/docs/next/installation/docker"
)

for url in "${urls[@]}"; do
  http_code=$(curl -s -o /dev/null -w "%{http_code}" -L "$url" 2>&1)
  echo "$url: HTTP $http_code"
done

Repository: khulnasoft/GPT-DB

Length of output: 461


All new documentation URLs are currently inaccessible—HTTP connection failures on all tested links.

The migrated documentation URLs across lines 68, 165, 167, 231, 328, 441–442 all fail to connect (HTTP 000). These include:

  • http://gpt.khulnasoft.com/docs/next/quickstart
  • http://gpt.khulnasoft.com/docs/next/config-reference/app/config_chatdashboardconfig_2480d0
  • http://gpt.khulnasoft.com/docs/next/config-reference/utils/config_s3storageconfig_f0cdc9
  • http://gpt.khulnasoft.com/docs/next/installation/docker-build-guide
  • http://gpt.khulnasoft.com/docs/next/installation/docker

Ensure the new documentation domain is fully deployed and publicly accessible before merging, or temporarily revert to working URLs if the new domain is not yet ready.

🤖 Prompt for AI Agents
In `@docs/blog/2025-03-24-gptdb-v0.7.0-release.md` at line 68, Update the broken
documentation links in the release blog post
(docs/blog/2025-03-24-gptdb-v0.7.0-release.md) to point to publicly accessible
URLs: either replace the unreachable http://gpt.khulnasoft.com/**...** links
(e.g. /docs/next/quickstart,
/docs/next/config-reference/app/config_chatdashboardconfig_2480d0,
/docs/next/config-reference/utils/config_s3storageconfig_f0cdc9,
/docs/next/installation/docker-build-guide, /docs/next/installation/docker) with
their live equivalents on the currently deployed docs site or temporarily revert
them to the previous working URLs; verify each updated link returns 200 OK
before merging and update all occurrences listed in the comment.

**Chinese**

[概览](https://www.yuque.com/eosphoros/gptdb-docs/bex30nsv60ru0fmx)
[概览](https://www.yuque.com/khulnasoft/gptdb-docs/bex30nsv60ru0fmx)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check Chinese documentation URL accessibility

url="https://www.yuque.com/khulnasoft/gptdb-docs/bex30nsv60ru0fmx"
echo "Testing URL: $url"

# Use curl with timeout and follow redirects
response=$(curl -s -w "\n%{http_code}" -L --connect-timeout 5 --max-time 10 "$url" 2>&1)
http_code=$(echo "$response" | tail -n1)
body=$(echo "$response" | head -n-1)

echo "HTTP Status Code: $http_code"

# Check if response is valid
if [ "$http_code" -eq 200 ]; then
  echo "Result: URL is accessible"
elif [ "$http_code" -eq 301 ] || [ "$http_code" -eq 302 ] || [ "$http_code" -eq 303 ] || [ "$http_code" -eq 307 ] || [ "$http_code" -eq 308 ]; then
  echo "Result: URL redirects (Status: $http_code)"
elif [ "$http_code" -eq 404 ]; then
  echo "Result: URL not found (404)"
elif [ "$http_code" -eq 403 ]; then
  echo "Result: Access forbidden (403)"
else
  echo "Result: Unexpected status code ($http_code)"
fi

Repository: khulnasoft/GPT-DB

Length of output: 178


🌐 Web query:

Yuque documentation URL https://www.yuque.com/khulnasoft/gptdb-docs/bex30nsv60ru0fmx accessibility

💡 Result:

  • Direct access from here failed: the specific doc URL timed out when fetched, so its public accessibility can’t be confirmed programmatically. [1]
  • The parent repo URL https://www.yuque.com/khulnasoft/gptdb-docs returned 404 Not Found, which commonly indicates the knowledge base/repo slug isn’t publicly reachable (deleted/renamed) or is access-restricted. [2]

What to do to verify/access it

  1. Open the URL in a normal browser session. If prompted, sign in to Yuque and request permission from the repo owner.
  2. If you have access but need non-interactive retrieval, use Yuque’s API with a Personal Access Token (common requirement for SDK/API access). [3][4]

Sources: [1] [2] [3] [4]


The Chinese documentation URL appears to be inaccessible or broken.

The parent documentation repository at https://www.yuque.com/khulnasoft/gptdb-docs returns a 404 error, and the specific URL at line 426 times out when accessed. This indicates the documentation may be deleted, renamed, or access-restricted. Verify that the correct URL has been published and is publicly accessible, or confirm whether authentication is required.

🤖 Prompt for AI Agents
In `@docs/blog/2025-03-24-gptdb-v0.7.0-release.md` at line 426, The link labeled
"概览" currently points to
https://www.yuque.com/khulnasoft/gptdb-docs/bex30nsv60ru0fmx which appears
inaccessible/404; verify whether the correct public URL for the Chinese
documentation exists and is published, and then update the link target in
docs/blog/2025-03-24-gptdb-v0.7.0-release.md to the working URL (or replace with
an archived URL or mirror); if the doc requires authentication, either remove
the broken link or add a note beside "概览" indicating that authentication is
required and provide instructions or an alternative public resource.

- **API Models:** Wenxin, Tongyi, Zhipu, and other commercial models

- **Fine-tuning Framework**: Model fine-tuning is an indispensable capability for any enterprise to implement in vertical and niche domains. GPT-DB provides a complete fine-tuning framework that integrates seamlessly with the GPT-DB project. In recent fine-tuning efforts, an accuracy rate based on the Spider dataset has been achieved at 82.5%.
### � Privacy and Security
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix malformed character (encoding issue).

Line 150 contains a malformed character "�" that appears to be a Unicode/emoji rendering issue. This should be replaced with the intended emoji or icon.

🔧 Suggested fix
-### � Privacy and Security
+### 🔒 Privacy and Security
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### Privacy and Security
### 🔒 Privacy and Security
🤖 Prompt for AI Agents
In `@README.md` at line 150, Replace the malformed character in the header string
"### � Privacy and Security" with the intended emoji or plain text (for example
"### 🔒 Privacy and Security" or "### Privacy and Security"), update the
README.md header accordingly, and ensure the file is saved in UTF-8 encoding so
the emoji renders correctly across systems.

- **Secure Architecture** - End-to-end security framework

- **Data Sources**: Integrating various data sources to seamlessly connect production business data to the core capabilities of GPT-DB.
### � Supported Data Sources
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix malformed character (encoding issue).

Line 158 contains a malformed character "�" that appears to be a Unicode/emoji rendering issue. This should be replaced with the intended emoji or icon.

🔧 Suggested fix
-### � Supported Data Sources
+### 📊 Supported Data Sources
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### Supported Data Sources
### 📊 Supported Data Sources
🤖 Prompt for AI Agents
In `@README.md` at line 158, The header "### � Supported Data Sources" contains a
malformed Unicode character; update that header (the string "### � Supported
Data Sources") to use the intended character (e.g., replace "�" with a proper
emoji like "🔌" or "🔍", or remove it entirely so it reads "### Supported Data
Sources") and save the README.md with UTF-8 encoding to prevent future encoding
issues.

Comment on lines +225 to +228
## ⚠️ Disclaimer

📖 **[Disclaimer](./DISCKAIMER.md)** - Please read our disclaimer for important information about the project.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix typo in disclaimer filename.

Line 227 contains a typo: "DISCKAIMER.md" should be "DISCLAIMER.md". This will result in a broken link.

🔧 Proposed fix
-📖 **[Disclaimer](./DISCKAIMER.md)** - Please read our disclaimer for important information about the project.
+📖 **[Disclaimer](./DISCLAIMER.md)** - Please read our disclaimer for important information about the project.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## ⚠️ Disclaimer
📖 **[Disclaimer](./DISCKAIMER.md)** - Please read our disclaimer for important information about the project.
## ⚠️ Disclaimer
📖 **[Disclaimer](./DISCLAIMER.md)** - Please read our disclaimer for important information about the project.
🤖 Prompt for AI Agents
In `@README.md` around lines 225 - 228, The README has a broken link due to a
filename typo: change the link target "DISCKAIMER.md" to the correct
"DISCLAIMER.md" in the Markdown link (the line containing the "Disclaimer" link)
so the reference points to the correct file name.

- Fix is_exist method to return False on exception instead of True to avoid false positives
- Fix string replace assignment in use-chat.ts to properly apply newline replacements
- Remove inefficient database cleanup logic in benchmark_data_manager.py
- Replace custom JSONPath implementation with jsonpath-ng library for robustness
- Pass system_app context to get_benchmark_manager() calls
- Refactor Neo4j delete_graph method to use single DETACH DELETE query for better performance
- Update HTTP-Referer header URL to use correct khulnasoft-bot repository
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants