Skip to content

feat: Add column name mappings option#1649

Open
FelipeKalinoski wants to merge 3 commits intoGoogleCloudPlatform:developfrom
FelipeKalinoski:1617-column-name-mappings-row-partition
Open

feat: Add column name mappings option#1649
FelipeKalinoski wants to merge 3 commits intoGoogleCloudPlatform:developfrom
FelipeKalinoski:1617-column-name-mappings-row-partition

Conversation

@FelipeKalinoski
Copy link
Contributor

##PLEASE CAREFULLY REVIEW THE CHANGES BEFORE ACCEPTING IT##

Description of changes

This PR implements support for column name mapping (--column-name-map) across row validation and partition generation. Key changes include:

  • ConfigManager Updates: Modified build_config_comparison_fields, build_dependent_aliases, and build_column_configs to correctly resolve source column names to their target equivalents using the mapping provided.
  • Calculated Field Refactoring: Updated _get_calculated_config and _get_source/target_ibis_calculated_table to iteratively apply calculated fields, resolving issues where dependencies (like hash__all) were not found when mappings were active.
  • System Testing: Added system tests in tests/system/data_sources/test_oracle.py covering row validation with --concat=* and partition generation with mapped primary keys.
  • Unit Test Stability: Patched tests/unit/test__main.py to correctly mock GCS directory listing, ensuring local test runs pass without requiring live Google Cloud credentials.

Issues to be closed

Closes #1617

Checklist

  • I have followed the CONTRIBUTING Guide.
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated any relevant documentation to reflect my changes, if applicable
  • I have added unit and/or integration tests relevant to my change as needed
  • I have already checked locally that all unit tests and linting are passing (use the tests/local_check.sh script)
  • I have manually executed end-to-end testing (E2E) with the affected databases/engines (verified via mocked system tests and standalone demo script)

nj1973 and others added 2 commits January 14, 2026 14:24
…generation

- Added support for --column-name-map in ConfigManager for primary keys, comparison fields, and calculated fields.
- Fixed circular dependency and iterative calculated field resolution in __main__.py.
- Added comprehensive system tests in test_oracle.py for row validation and partition generation with mapping.
- Patched unit tests in test__main.py to correctly mock GCS calls in local environments.
- Verified all 365 unit tests and new system tests pass.
@google-cla
Copy link

google-cla bot commented Jan 20, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@nj1973 nj1973 changed the title 1617 column name mappings row partition feat: Add column name mappings option Jan 23, 2026
@nj1973 nj1973 self-requested a review January 23, 2026 14:14
@nj1973
Copy link
Collaborator

nj1973 commented Jan 27, 2026

I tested the changes and my previously passing test test_column_validation_column_name_map_to_bigquery now fails with a duplicate column error. I need to try and understand the changes you've made to figure out why my test now fails.

Copy link
Collaborator

@nj1973 nj1973 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested a change to oracle integration tests.

I also need to try and debug the changes for column validation which no longer works.

}


class MockIbisClient:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you add this so you could test without BigQuery?

I tested test_row_validation_column_name_map_to_bigquery without this and it passed.

e.g.:

-@mock.patch("data_validation.clients.get_data_client", side_effect=mock_get_data_client)
+# @mock.patch("data_validation.clients.get_data_client", side_effect=mock_get_data_client)
 @mock.patch(
     "data_validation.state_manager.StateManager.get_connection_config",
     new=mock_get_connection_config,
 )
-def test_row_validation_column_name_map_to_bigquery(mock_get_client):
+# def test_row_validation_column_name_map_to_bigquery(mock_get_client):
+def test_row_validation_column_name_map_to_bigquery():

Ideally I'd rather these integration tests run with real tables and not mocked ones.

I appreciate this makes it harder for you to test but it matches other tests in this file.

Can we remove the mocked client please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did use mock tables for testing without BigQuery. We can delete it.

@FelipeKalinoski
Copy link
Contributor Author

Requested a change to oracle integration tests.

I also need to try and debug the changes for column validation which no longer works.

@nj1973 did you create a unit test for testing column validation or were you executing a "manual" test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cater for column name mismatches across systems

2 participants