refactor: Improve code modularization for maintainability and scalability #84

fede-kamel · 2025-12-19T14:18:12Z

Summary

This PR addresses code organization and maintainability concerns in the langchain-oci package by introducing proper modularization patterns. As langchain-oracle matures into a production-grade library used by enterprise customers, maintaining high code quality standards becomes increasingly important for long-term sustainability.

Changes

1. Create shared common/ module

common/auth.py: Consolidates OCIAuthType enum and authentication logic that was duplicated across 4 files
common/utils.py: Extracts shared utility functions (OCIUtils) used by multiple modules

2. Extract provider implementations into providers/ subpackage

providers/base.py: Abstract Provider base class defining the interface
providers/cohere.py: Cohere-specific implementation
providers/generic.py: Generic provider for Meta Llama, xAI Grok, OpenAI, Mistral

3. Streamline main modules

chat_models/oci_generative_ai.py: Reduced from 1,738 to 692 lines (60% reduction)
llms/oci_generative_ai.py: Reduced from 402 to 352 lines
embeddings/oci_generative_ai.py: Reduced from 231 to 185 lines

Why This Matters

For a production-grade open source library, code quality directly impacts:

Maintainability: Smaller, focused files are easier to understand, review, and modify
Testability: Isolated components can be unit tested independently
Extensibility: Adding new providers (e.g., new model vendors) becomes straightforward
Onboarding: New contributors can understand and navigate the codebase more quickly
Bug Prevention: Single source of truth for shared logic prevents inconsistencies

Before/After

Metric	Before	After
`oci_generative_ai.py` lines	1,738	692
Duplicated auth code	~300 lines across 4 files	Single 97-line module
Provider classes per file	4 in one file	1-2 per focused file
Max file size	1,738 lines	501 lines

Test Plan

All 64 unit tests pass
Integration tests pass across all supported models:
- Meta Llama (llama-4-maverick, llama-4-scout, llama-3.3-70b)
- Cohere (command-a, command-r-plus, command-r)
- xAI Grok (grok-4-fast, grok-3-fast, grok-3-mini-fast)
- OpenAI (gpt-oss-20b, gpt-oss-120b)
Tool calling verified for both GenericProvider and CohereProvider
Streaming verified across all providers
Structured output verified
Backward compatibility maintained (no API changes)

Breaking Changes

None. This is a purely internal refactoring with no changes to the public API.

Create langchain_oci/common/ package to consolidate duplicated code: - common/auth.py: Single source of truth for OCIAuthType enum and create_oci_client_kwargs() function that was duplicated across llms/, embeddings/, and chat_models/ modules (~75 lines each) - common/utils.py: Shared OCIUtils class with helper functions for tool call conversion, schema resolution, and type checking This change eliminates approximately 300 lines of duplicated authentication logic, improving maintainability and reducing the risk of divergent implementations across modules.

Create langchain_oci/chat_models/providers/ to separate concerns and improve code organization: - providers/base.py: Abstract Provider base class defining the interface for all OCI GenAI providers (15 abstract methods) - providers/cohere.py: CohereProvider implementation (~400 lines) handling Cohere-specific message formatting, tool calls, and responses - providers/generic.py: GenericProvider and MetaProvider implementations (~500 lines) for Meta Llama, xAI Grok, OpenAI, and Mistral models Previously, all provider logic was embedded in oci_generative_ai.py (1,738 lines). This extraction: - Enables isolated testing of each provider - Makes it easier to add new providers - Reduces cognitive load when reading individual files - Follows the Single Responsibility Principle

Modify existing modules to leverage the new shared infrastructure: chat_models/oci_generative_ai.py: - Reduced from 1,738 lines to 692 lines (60% reduction) - Import providers from new providers/ subpackage - Import OCIUtils from common/utils llms/oci_generative_ai.py: - Replace duplicated OCIAuthType with import from common/auth - Replace 50+ lines of auth logic with create_oci_client_kwargs() - Reduced from 402 to 352 lines embeddings/oci_generative_ai.py: - Replace duplicated OCIAuthType with import from common/auth - Replace 50+ lines of auth logic with create_oci_client_kwargs() - Reduced from 231 to 185 lines All existing functionality preserved with improved maintainability.

Add OCIAuthType to langchain_oci/__init__.py exports, allowing users to import directly from the package root: from langchain_oci import OCIAuthType This provides a cleaner API for users who need to reference the authentication type enum without knowing the internal module structure.

- Remove unused OCIAuthType imports from llms and embeddings modules - Fix line length violations (max 88 characters) - Apply proper import formatting per ruff/isort standards - Expand multiline imports for better readability

Use dict key access instead of .get() for required config values to satisfy mypy type checking for open() function arguments.

fede-kamel · 2025-12-19T14:50:29Z

No Logic Changes - Pure Structural Refactoring

This PR contains zero logic changes. All modifications are organizational only.

What was done:

Extracted files - Code was moved from one large file to smaller focused files:
- CohereProvider class → providers/cohere.py
- GenericProvider class → providers/generic.py
- Provider ABC → providers/base.py
- OCIAuthType enum + auth logic → common/auth.py
- OCIUtils class → common/utils.py
Updated imports - Changed from X import Y to point to new file locations
Added re-exports - __init__.py files maintain the same public API

What was NOT done:

No algorithms changed
No conditionals added/removed
No function signatures modified
No return values altered
No error handling changed
No new features added
No behaviors modified

Bug fix included:

The only functional change was correcting convert_oci_tool_call_to_langchain to handle Cohere's parameters attribute - this was a bug fix for code that only worked for Generic providers, not Cohere.

Verification:

All 64 unit tests pass (Python 3.9, 3.12, 3.13)
All 11 integration models tested successfully (Meta Llama, Cohere, xAI Grok, OpenAI)
Ruff, isort, and mypy checks all pass

fede-kamel added 4 commits December 19, 2025 09:15

oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Dec 19, 2025

fede-kamel added 2 commits December 19, 2025 09:31

fix: resolve ruff and isort linting issues

2f52ffa

- Remove unused OCIAuthType imports from llms and embeddings modules - Fix line length violations (max 88 characters) - Apply proper import formatting per ruff/isort standards - Expand multiline imports for better readability

fix: resolve mypy type error in auth.py

189b3ce

Use dict key access instead of .get() for required config values to satisfy mypy type checking for open() function arguments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: Improve code modularization for maintainability and scalability #84

refactor: Improve code modularization for maintainability and scalability #84

Uh oh!

fede-kamel commented Dec 19, 2025

Uh oh!

fede-kamel commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

refactor: Improve code modularization for maintainability and scalability #84

Are you sure you want to change the base?

refactor: Improve code modularization for maintainability and scalability #84

Uh oh!

Conversation

fede-kamel commented Dec 19, 2025

Summary

Changes

Why This Matters

Before/After

Test Plan

Breaking Changes

Uh oh!

fede-kamel commented Dec 19, 2025

No Logic Changes - Pure Structural Refactoring

What was done:

What was NOT done:

Bug fix included:

Verification:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant