Skip to content

Conversation

@keshavmohta09
Copy link
Member

Summary

This PR addresses two issues with the Oracle connector regarding view definitions:

  1. Empty Schema Definitions (Oracle Connector: schemaDefinition empty for some views (LONG column issue in ALL.VIEWS.text) #23387): Replaced usage of the TEXT column (LONG type) with DBMS_METADATA.GET_DDL to correctly retrieve large or complex view definitions.
  2. Schema Filtering (Support for getting Oracle view definitions filtered by schema #14238): Added support for filtering view definitions by schema, preventing the connector from querying DBA_VIEWS for the entire database when only a specific schema is needed.

Changes

  • queries.py:
    • Updated ORACLE_ALL_VIEW_DEFINITIONS to use DBMS_METADATA.GET_DDL.
    • Added ORACLE_VIEW_DEFINITIONS_BY_SCHEMA.
  • utils.py:
    • Updated get_all_view_definitions to accept a query and schema parameter.
    • Updated get_view_definition to use the schema-filtered query when a schema is provided.
  • test_oracle_views.py: Added unit tests to verify the fix.

@gitar-bot
Copy link

gitar-bot bot commented Jan 9, 2026

Code Review ⚠️ Changes requested

Good fix for Oracle view definitions using DBMS_METADATA, but there's a critical case sensitivity bug that will cause schema-filtered queries to return no results.

⚠️ Bug: Schema-filtered query may overwrite cached data from other schemas

📄 ingestion/src/metadata/ingestion/source/database/oracle/utils.py:82-88

The get_view_definition function calls get_all_view_definitions when a key is not found in the cache. However, if views from different schemas are queried in sequence, subsequent schema-filtered queries will only add views from that specific schema to the cache.

The problem is that when a view is not found in the cache with a schema filter, the code fetches views only for that schema. But the cache reset logic only resets when current_db changes, not when switching between schemas. This means:

  1. First call with schema="schema1" populates cache with schema1's views
  2. Second call with schema="schema2" adds schema2's views to existing cache (correct behavior)
  3. Third call with schema=None will call ORACLE_ALL_VIEW_DEFINITIONS - which is fine

However, the real issue is that if get_view_definition is called with a schema that doesn't have the requested view, it will return an empty string without ever checking if the view exists in another schema. This may be intentional behavior but could lead to confusion.

Consider adding a comment to clarify this is expected behavior, or consider whether the function should fall back to checking other schemas if the view isn't found in the specified schema's cache.

⚠️ Bug: Inconsistent text() wrapping for SQL queries

📄 ingestion/src/metadata/ingestion/source/database/oracle/utils.py:91-111

In get_all_view_definitions, when schema is provided, the query is wrapped with sql.text() before execution:

result = connection.execute(sql.text(query), {"owner": schema})

But when no schema is provided, the query is executed directly without wrapping:

result = connection.execute(query)

This inconsistency could cause issues depending on SQLAlchemy version. Modern SQLAlchemy 2.0+ requires text() wrapping for raw SQL strings. If the code runs with SQLAlchemy 2.0+, the non-schema branch may fail or emit deprecation warnings.

Suggested fix: Consistently wrap both queries with sql.text():

if schema:
    result = connection.execute(sql.text(query), {"owner": schema.upper()})
else:
    result = connection.execute(sql.text(query))
⚠️ Bug: Schema case mismatch: query expects uppercase but code passes lowercase

📄 ingestion/src/metadata/ingestion/source/database/oracle/queries.py:46-61 📄 ingestion/src/metadata/ingestion/source/database/oracle/utils.py:68-69

There's a case sensitivity mismatch between the query and the code:

  1. In utils.py, the schema is converted to lowercase:

    schema = schema.lower() if schema else None
  2. In queries.py, the ORACLE_VIEW_DEFINITIONS_BY_SCHEMA query compares against owner directly:

    WHERE owner = :owner
  3. In Oracle's DBA_VIEWS and DBA_MVIEWS, the OWNER column typically stores values in UPPERCASE.

This means the query WHERE owner = :owner with a lowercase schema value will likely return no results, since Oracle stores owner names in uppercase by default.

Suggested fix: Either:

  1. Pass the schema in uppercase to the query:
    result = connection.execute(sql.text(query), {"owner": schema.upper()})
  2. Or modify the query to use UPPER():
    WHERE UPPER(owner) = UPPER(:owner)

The first option is more efficient as it allows Oracle to use indexes on the owner column.

What Works Well

The approach of using DBMS_METADATA.GET_DDL instead of the LONG column is the correct solution for Oracle view definitions. Adding schema filtering is a valuable optimization. Unit tests cover the key scenarios.

Recommendations

The most critical fix needed is the case sensitivity issue - Oracle stores owner names in uppercase, but the code passes lowercase values. Test this against a real Oracle database with schema filtering before merging. Also consider adding an integration test that verifies the actual DDL is retrieved correctly.

Options

Auto-apply is off Gitar will not commit updates to this branch.
Display: compact Hiding non-applicable rules.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | This comment will update automatically (Docs)

Comment on lines +82 to +89
if schema:
self.get_all_view_definitions(
connection, ORACLE_VIEW_DEFINITIONS_BY_SCHEMA, schema
)
else:
self.get_all_view_definitions(connection, ORACLE_ALL_VIEW_DEFINITIONS)

return self.all_view_definitions.get(key, "")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: Schema-filtered query may overwrite cached data from other schemas

Details

The get_view_definition function calls get_all_view_definitions when a key is not found in the cache. However, if views from different schemas are queried in sequence, subsequent schema-filtered queries will only add views from that specific schema to the cache.

The problem is that when a view is not found in the cache with a schema filter, the code fetches views only for that schema. But the cache reset logic only resets when current_db changes, not when switching between schemas. This means:

  1. First call with schema="schema1" populates cache with schema1's views
  2. Second call with schema="schema2" adds schema2's views to existing cache (correct behavior)
  3. Third call with schema=None will call ORACLE_ALL_VIEW_DEFINITIONS - which is fine

However, the real issue is that if get_view_definition is called with a schema that doesn't have the requested view, it will return an empty string without ever checking if the view exists in another schema. This may be intentional behavior but could lead to confusion.

Consider adding a comment to clarify this is expected behavior, or consider whether the function should fall back to checking other schemas if the view isn't found in the specified schema's cache.

Comment on lines +104 to +107
if schema:
result = connection.execute(sql.text(query), {"owner": schema})
else:
result = connection.execute(query)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: Inconsistent text() wrapping for SQL queries

Details

In get_all_view_definitions, when schema is provided, the query is wrapped with sql.text() before execution:

result = connection.execute(sql.text(query), {"owner": schema})

But when no schema is provided, the query is executed directly without wrapping:

result = connection.execute(query)

This inconsistency could cause issues depending on SQLAlchemy version. Modern SQLAlchemy 2.0+ requires text() wrapping for raw SQL strings. If the code runs with SQLAlchemy 2.0+, the non-schema branch may fail or emit deprecation warnings.

Suggested fix: Consistently wrap both queries with sql.text():

if schema:
    result = connection.execute(sql.text(query), {"owner": schema.upper()})
else:
    result = connection.execute(sql.text(query))

Comment on lines 46 to +61

ORACLE_VIEW_DEFINITIONS_BY_SCHEMA = textwrap.dedent(
"""
SELECT
LOWER(view_name) AS "view_name",
LOWER(owner) AS "schema",
DBMS_METADATA.GET_DDL('VIEW', view_name, owner) AS view_def
FROM DBA_VIEWS
WHERE owner = :owner
UNION ALL
SELECT
LOWER(mview_name) AS "view_name",
LOWER(owner) AS "schema",
DBMS_METADATA.GET_DDL('MATERIALIZED_VIEW', mview_name, owner) AS view_def
FROM DBA_MVIEWS
WHERE owner = :owner
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: Schema case mismatch: query expects uppercase but code passes lowercase

Details

There's a case sensitivity mismatch between the query and the code:

  1. In utils.py, the schema is converted to lowercase:

    schema = schema.lower() if schema else None
  2. In queries.py, the ORACLE_VIEW_DEFINITIONS_BY_SCHEMA query compares against owner directly:

    WHERE owner = :owner
  3. In Oracle's DBA_VIEWS and DBA_MVIEWS, the OWNER column typically stores values in UPPERCASE.

This means the query WHERE owner = :owner with a lowercase schema value will likely return no results, since Oracle stores owner names in uppercase by default.

Suggested fix: Either:

  1. Pass the schema in uppercase to the query:
    result = connection.execute(sql.text(query), {"owner": schema.upper()})
  2. Or modify the query to use UPPER():
    WHERE UPPER(owner) = UPPER(:owner)

The first option is more efficient as it allows Oracle to use indexes on the owner column.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

2 participants