-
Notifications
You must be signed in to change notification settings - Fork 7
Simplify File Handling, Tune Agentic #419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Consolidated Azure storage container handling to use a single container defined in environment variables, removing support for multiple containers. - Updated related code and tests to reflect the new single-container approach, ensuring consistency across the application. - Enhanced documentation in INTERFACE.md to clarify storage architecture and operations, including retention management and URL generation. - Added tests for retention setting and ensured short-lived URLs are consistently included in responses.
…dling - Updated Redis client initialization to use `ioredis` for better error management and connection handling. - Implemented a retry strategy with exponential backoff for connection attempts. - Added event listeners for connection status updates and error logging to prevent process crashes. - Maintained compatibility with existing mock client functionality.
- Updated the README to clarify model selection mechanisms, detailing static model selection and dynamic runtime overrides. - Improved file handling functions to support optional container parameters for file deletion and uploads, allowing for better organization in cloud storage. - Introduced a new tool for step-by-step planning, enhancing the system's ability to manage complex tasks. - Added functionality for short-lived URLs in file handling, ensuring efficient access to files while maintaining security. - Removed deprecated reasoning tool and streamlined image viewing capabilities, improving overall system performance and usability. - Enhanced tests for file handling and short-lived URL functionality, ensuring robust error handling and accurate responses.
…container management - Updated INTERFACE.md to include optional `contextId` for per-user/per-context file scoping, enhancing file isolation in multi-tenant applications. - Modified file handling scripts to support context-scoped keys, ensuring secure access and management of files based on user context. - Simplified container management by enforcing a single container approach, removing legacy support for multiple containers. - Enhanced Redis key management to facilitate migration from legacy keys to the new context-scoped format, ensuring backward compatibility. - Improved tests to validate context scoping functionality and ensure robust handling of legacy keys during migration. - Updated environment variable validation to reflect the new single container requirement, improving clarity in configuration settings.
… retention management - Introduced `fetchFileFromUrl` to streamline file retrieval from URLs, supporting context IDs for scoped file storage. - Updated `buildFileHandlerUrl` to handle context IDs and improve query parameter management. - Enhanced existing functions like `getMediaChunks`, `markCompletedForCleanUp`, and `deleteFileByHash` to accept context IDs, ensuring better file management in multi-tenant environments. - Improved retention management by adding `setRetentionForHash` to allow setting file retention policies. - Updated various tools and plugins to utilize new context-aware file handling functions, ensuring consistent behavior across the application. - Enhanced tests to validate new functionality and ensure robust error handling.
…port - Updated `uploadBlob` and `uploadFile` functions to accept and extract `contextId` from form fields, improving context-aware file handling. - Modified `setRetention` to include `contextId` for scoped file storage, ensuring accurate retention management. - Implemented logic to default uploads to temporary status, aligning with file collection practices. - Enhanced Redis cleanup logic to skip permanent files, ensuring they are not removed during age-based cleanup. - Added tests to validate context-aware retention and cleanup behavior, ensuring robust functionality across scenarios.
…e features - Introduced contextId support across various file handling functions, improving scoped file management. - Updated GraphQL resolvers to differentiate between queries and mutations, enhancing the structure and clarity of API interactions. - Implemented new pathways for reading and updating file metadata, ensuring backward compatibility while streamlining operations. - Enhanced file collection management by integrating displayFilename persistence, improving user experience during file retrieval. - Refactored existing tests to validate new context-aware functionalities and ensure robust error handling across scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request modernizes the Cortex File Handler's storage architecture by consolidating to a single Azure Blob Storage container with blob index tags for lifecycle management, introducing context-scoped file isolation via Redis hash maps, and adding support for short-lived URLs. The changes also include new pathways for file metadata management, a new Gemini 3 reasoning plugin, removal of legacy OpenAI pathway stubs, and updates to entity constants for improved agentic behavior.
Key changes:
- Simplified Azure storage from multiple containers to a single container with retention tags (
temporary/permanent) - Introduced
contextIdparameter for per-user/tenant file isolation using Redis keys in format<hash>:ctx:<contextId> - Added short-lived URL support (5-minute expiration) for secure file access across all file handler responses
- New GraphQL pathways:
sys_read_file_collection,sys_update_file_metadatawith mutation support - Added
Gemini3ReasoningVisionPluginfor reasoning-enabled vision models - Removed legacy OpenAI pathway stubs (o3, gpt-5, gpt-4.1, grok-4, ollama, gemini-2.5)
Reviewed changes
Copilot reviewed 86 out of 86 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/core/*.test.js | Updated unit tests for file collection and short-lived URL functionality |
| tests/integration/features/tools/*.test.js | Updated integration tests to use direct Redis access instead of memory system |
| pathways/system/entity/tools/*.js | Updated file collection tools to use Redis hash maps and context scoping |
| pathways/system/entity/files/*.js | New pathways for file collection and metadata management |
| pathways/system/entity/memory/*.js | Removed memoryFiles from memory system (now separate in Redis) |
| pathways/system/rest_streaming/*.js | Removed legacy pathway stubs for unreleased models |
| server/typeDef.js | Added mutation support with isMutation flag |
| server/graphql.js | Updated resolvers to support mutations |
| server/plugins/*.js | New Gemini3ReasoningVisionPlugin and updates for contextId support |
| lib/fileUtils.js | Major refactor for Redis hash maps, context scoping, and short-lived URLs |
| lib/entityConstants.js | Updated entity prompts for improved agentic behavior |
| helper-apps/cortex-file-handler/tests/*.test.js | Comprehensive test updates for single-container architecture |
| helper-apps/cortex-file-handler/src/*.js | Storage service updates for retention tags and context scoping |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Introduced a new `FILE_SYSTEM_DOCUMENTATION.md` detailing the architecture, file handler service, utilities layer, and file collection system. - Included sections on key concepts, API endpoints, and error handling to enhance understanding of file operations. - Updated `README.md` to reference the new documentation and provide an overview of file system capabilities. - Enhanced tests to validate file collection and metadata management functionalities.
…ved URL resolution - Modified the `generateFileMessageContent` function to pass `contextId` when ensuring short-lived URLs, improving file retrieval accuracy within the correct context scope.
- Updated file handling functions to incorporate inCollection metadata, allowing for flexible filtering of files based on chat IDs. - Modified loadFileCollection to cache raw file data, enabling efficient retrieval and filtering by inCollection status. - Enhanced updateFileMetadata to support updates to inCollection, including normalization of values for backward compatibility. - Improved tests to validate the functionality of inCollection updates and ensure accurate file retrieval based on context.
- Revised the GraphQL pathway for reading file collections to clarify that file collections are now stored in Redis hash maps. - Updated memory normalization logic to filter out deprecated memoryFiles, ensuring only valid sections are processed and stored in the upgraded memory format.
…ndler - Enhanced the `getFileStoreMap` function to clarify the logic for checking file existence in primary and GCS backup storage, ensuring accurate cleanup decisions. - Updated the `removeFromFileStoreMap` function to handle scoped hash formats, allowing for more flexible removal of entries from both unscoped and context-scoped maps. - Improved logging messages to provide clearer context regarding file operations and potential issues during cleanup.
- Bumped the version of @aj-archipelago/cortex-file-handler from 2.6.4 to 2.7.0 in both package.json and package-lock.json to reflect the latest changes and improvements.
- Removed the setup for Azure test environment from the GitHub Actions workflow, consolidating GCS and Azure tests into a single step. - Updated the Azure container setup script to always create default test containers and deduplicate container names from the environment variable, improving clarity and efficiency in container management.
- Updated the `getContainerName` function to improve handling of environment variable values, ensuring a default value of "cortextempfiles" is returned if the variable is not set, empty, or contains the string "undefined". - Added fallback logic to handle legacy comma-separated values and ensure robustness in container name retrieval. - Increased timeout settings in tests for image content processing to accommodate longer processing times.
- Added checks to handle scenarios where Azurite (local emulator) may not support blob tags, ensuring that tag updates are skipped gracefully without throwing errors. - Enhanced logging to provide clear warnings when tag updates fail in test environments, allowing operations to continue smoothly. - Updated the `updateBlobTags` method to account for potential failures in tag operations, improving robustness in both local and production environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 94 out of 95 changed files in this pull request and generated 2 comments.
Files not reviewed (1)
- helper-apps/cortex-file-handler/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Added support for checking if the provider can handle blob tag operations, enhancing compatibility with local emulators like Azurite. - Streamlined the extraction of blob names and generation of short-lived URLs, ensuring operations continue smoothly even when tag updates fail. - Improved logging for better visibility into tag update failures and fallback mechanisms for providers that do not support blob tags.
- Added functions to write and read file data with encryption for sensitive fields (tags and notes) using a context key. - Updated existing file handling functions to support optional context key for encryption and decryption during metadata updates. - Enhanced tests to validate encryption functionality, ensuring sensitive data is securely stored and can be decrypted correctly. - Ensured core fields remain unencrypted for accessibility while maintaining data integrity.
- Updated the encryption function to use AES-256-GCM with a 12-byte IV and added support for generating an authentication tag. - Modified the decryption function to handle both the new GCM format (iv:tag:encrypted) and the legacy CBC format (iv:encrypted) for backward compatibility. - Improved error handling and logging for decryption failures, ensuring better visibility into issues with invalid message formats.
- Introduced MIME type detection for uploaded files, storing the type for better file handling and context detection. - Enhanced logging by redacting sensitive information such as SAS tokens and context IDs to improve security. - Updated sanitization logic to ensure context IDs are specifically redacted during logging, maintaining data privacy.
- Wrapped YouTube URL check in a try-catch block to prevent errors from disrupting file lookup. - Updated the file message generation logic to create a clean chat history, avoiding confusion from previous messages. - Improved error handling for file analysis, ensuring clearer logging and recovery messages when issues arise.
- Introduced a new function to invalidate the file collection cache, ensuring that updates reflect immediately in subsequent operations. - Implemented serialization for file edit operations to prevent concurrent modifications, enhancing data integrity during file updates. - Updated file collection management to ensure cache is invalidated after file removals and edits, improving consistency in file operations. - Enhanced tests to validate the new cache invalidation and serialization features, ensuring robust functionality across file operations.
…tions - Introduced maximum file size limits (50MB) for editing and writing files to prevent memory issues during operations. - Added local caching for file content during edits to optimize performance and reduce redundant downloads/uploads. - Enhanced error handling to provide clear feedback when file sizes exceed the defined limits, improving user experience. - Updated the serialization logic for file edits to utilize cached content, ensuring efficient processing of sequential edits.
…nagement - Added `syncAndStripFilesFromChatHistory` function to sync files from chat history to the collection while replacing file content with placeholders. - Introduced `stripAllFilesFromChatHistory` to remove file and image content from messages, enhancing chat history readability. - Updated `addFileToCollection` to ensure correct ID usage for existing files, preventing ID mismatches in Redis. - Enhanced file collection tool to support metadata updates, including renaming, tagging, and notes management, with atomic operations for improved data integrity. - Added comprehensive tests for the new file metadata update functionality and sync operations, ensuring robust handling of file collections.
- Introduced a new tool for creating slides, infographics, and presentations using Gemini 3 Pro image generation. - Implemented input parameters for detailed instructions, filename prefix, and tagging for generated content. - Added functionality to resolve input images, upload generated visuals to cloud storage, and manage file collections. - Enhanced error handling and logging for improved user feedback during image generation and upload processes.
An extensive overall of the file handling system provides a more scalable, secure, and performant architecture for managing files in Cortex. This update also introduces dynamic generation of REST endpoints directly from model configurations and adds several new agent tools.
🚀 File System Architecture Overhaul
The file system has been completely refactored to improve scalability, security, and concurrency. This introduces a single-container architecture, atomic file collection management, and context-scoping for multi-tenancy.
AZURE_STORAGE_CONTAINER_NAMEenvironment variable now accepts only one container name. Files are distinguished using blob index tags (retention=temporaryorretention=permanent) instead of separate containers.FileStoreMap:ctx:<contextId>), and all modifications are performed using atomic Redis operations (e.g.,HSET,HDEL). This eliminates race conditions and the need for version-based locking, significantly improving reliability and performance.contextId): All file operations now support an optionalcontextIdto provide per-user or per-context file isolation, which is critical for multi-tenant applications.setRetentionoperation allows files to be marked aspermanent. By default, all uploaded files aretemporaryand are automatically deleted after 30 days. This is managed via blob index tags, requiring no file movement.shortLivedUrl(with a default 5-minute expiration) for secure, time-limited access, which is the preferred method for providing files to LLMs.sys_read_file_collection: A new GraphQL query to read the file collection for a given context.sys_update_file_metadata: A new GraphQL mutation for atomically updating a file's metadata (e.g., filename, tags, notes).FILE_SYSTEM_DOCUMENTATION.md, has been added to cover the entire file system architecture, including data flows, API endpoints, and best practices.✨ Dynamic REST Endpoint Generation
Cortex can now automatically generate OpenAI-compatible REST endpoints for any configured model, eliminating the need to create individual
sys_rest_streaming_*pathway files.emulateOpenAIChatModel: "your-model-name"oremulateOpenAICompletionModel: "your-model-name"to its definition inconfig.js.restStreamingblock can be added to the model configuration to customize the generated pathway'stimeout,inputParameters, and other settings.📚 Documentation & Tooling Improvements
README.mdhas been updated with a comprehensive guide on usingmodelfor static model selection andmodelOverridefor dynamic, runtime model switching.CreatePlan: A new planning tool that uses a high-reasoning model to generate detailed, step-by-step plans for complex tasks.ViewImages: Allows the agent to view image files from its file collection, injecting them into the conversation for analysis.sys_entity_agentcan now process image objects returned by theViewImagestool in addition to base64-encoded screenshots.