Skip to content

Redshift Profiler PR3: CLI, configs and tests#2306

Open
ysmx-github wants to merge 4 commits intomainfrom
feature/redshift-pr3-profiler-cli
Open

Redshift Profiler PR3: CLI, configs and tests#2306
ysmx-github wants to merge 4 commits intomainfrom
feature/redshift-pr3-profiler-cli

Conversation

@ysmx-github
Copy link

@ysmx-github ysmx-github commented Feb 19, 2026

Changes

This PR adds Redshift as a supported profiler assessment platform. It wires Redshift into the CLI and assessment config, and parameterizes profiler/validator tests for both synapse and Redshift using mocks (no live cluster).

What does this PR do?

  • Adds Redshift as a supported platform for the profiler assessment (alongside Synapse).
  • Wires Redshift into the CLI (source technology / transfer) and assessment constants (_constants.py: variants, config path template, PLATFORM_TO_SOURCE_TECHNOLOGY_CFG, PROFILER_SOURCE_SYSTEM).
  • Parameterizes existing profiler and profiler-validator tests for synapse and redshift; adds Redshift mock extract and test resources so all tests run for both platforms without a live Redshift cluster.
  • Full list of changed files:
    • Source
      • src/databricks/labs/lakebridge/assessments/init.py
      • src/databricks/labs/lakebridge/assessments/_constants.py
      • src/databricks/labs/lakebridge/assessments/configure_assessment.py
      • src/databricks/labs/lakebridge/assessments/profiler.py
      • src/databricks/labs/lakebridge/cli.py
      • src/databricks/labs/lakebridge/connections/credential_manager.py
    • Tests – integration
      • tests/integration/assessments/profiler_extract_utils.py
      • tests/integration/assessments/test_profiler.py
      • tests/integration/assessments/test_profiler_validator.py
    • Tests – resources
      • tests/resources/assessments/db_extract_redshift.py
      • tests/resources/assessments/pipeline_config_main_redshift.yml
      • tests/resources/assessments/redshift_schema_def.yml

Relevant implementation details

  • Tests: Session-scoped mock_redshift_profiler_extract and PLATFORM_VALIDATOR_CONFIG drive parameterized tests; db_extract_redshift.py and pipeline_config_main_redshift.yml provide a script-based Redshift-shaped DuckDB for execution tests with no real cluster.

Caveats/things to watch out for when reviewing:

Linked issues

Resolves #..

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs lakebridge ...
  • ... +add your own

Tests

  1. Redshift mock extract (tests/integration/assessments/profiler_extract_utils.py)
    Added Redshift table definitions: query_view, rs_managed_storage_gb, rs_nodes (same pattern as Synapse: 3 tables, 1 empty).
    Added RedshiftProfilerBuilder and build_mock_redshift_extract() so validator tests can use a Redshift-shaped DuckDB without a real cluster.
  2. Test schema for Redshift (tests/resources/assessments/redshift_schema_def.yml)
    Minimal schema for validator tests: query_view, rs_managed_storage_gb, rs_nodes.
    rs_managed_storage_gb.rs_managed_storage_gb is defined as VARCHAR (mock has DOUBLE) so test_validate_invalid_schema_check still fails as intended.
  3. Profiler validator tests (tests/integration/assessments/test_profiler_validator.py)
    Added session-scoped mock_redshift_profiler_extract and PLATFORM_VALIDATOR_CONFIG for platform-specific table names and counts.
    All 7 validator tests are parameterized with @pytest.mark.parametrize("platform", ["synapse", "redshift"]).
    Each test runs once per platform; logic is shared, no new test cases.
    test_get_profiler_extract_path is unchanged (no platform param).
  4. Profiler tests (tests/integration/assessments/test_profiler.py)
    All 5 tests parameterized with @pytest.mark.parametrize("platform", ["synapse", "redshift"]).
    Execution tests use script-based configs only (no live DB):
    Synapse: pipeline_config_main.yml + db_extract.py.
    Redshift: pipeline_config_main_redshift.yml + db_extract_redshift.py.
    test_profile_execution_config_override copies the platform-specific script (db_extract.py or db_extract_redshift.py) into a temp dir and runs it.
  5. Redshift script and config (for execution tests without live Redshift)
    tests/resources/assessments/db_extract_redshift.py: standalone script that creates a DuckDB at --db-path with the same 3 Redshift tables (no imports from tests).
  6. Manually Tested all credential flows on all clusters in AWS Sandbox account aws-sandbox-field-eng (332745928618)
    tests/resources/assessments/pipeline_config_main_redshift.yml: pipeline config that runs that script.
  • manually tested
  • added unit tests
  • added integration tests

@codecov
Copy link

codecov bot commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 18.62745% with 83 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.94%. Comparing base (87c4e12) to head (c8f4034).

Files with missing lines Patch % Lines
...abs/lakebridge/assessments/configure_assessment.py 12.72% 48 Missing ⚠️
...databricks/labs/lakebridge/assessments/pipeline.py 5.55% 17 Missing ⚠️
...databricks/labs/lakebridge/assessments/profiler.py 25.00% 9 Missing ⚠️
src/databricks/labs/lakebridge/cli.py 11.11% 8 Missing ⚠️
.../labs/lakebridge/connections/credential_manager.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2306      +/-   ##
==========================================
- Coverage   66.44%   65.94%   -0.51%     
==========================================
  Files          99       99              
  Lines        9090     9181      +91     
  Branches      974      988      +14     
==========================================
+ Hits         6040     6054      +14     
- Misses       2874     2952      +78     
+ Partials      176      175       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Feb 19, 2026

✅ 155/155 passed, 7 flaky, 5 skipped, 29m49s total

Flaky tests:

  • 🤪 test_installs_and_runs_local_bladebridge (19.767s)
  • 🤪 test_installs_and_runs_pypi_bladebridge (27.174s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (19.345s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[True] (19.173s)
  • 🤪 test_transpiles_informatica_to_sparksql (20.531s)
  • 🤪 test_transpile_teradata_sql_non_interactive[False] (21.816s)
  • 🤪 test_transpile_teradata_sql (8.84s)

Running from acceptance #3941

@ysmx-github ysmx-github changed the title Redshift Profiler PR3: CLI and configs Redshift Profiler PR3: CLI, configs and tests Feb 19, 2026
@sundarshankar89 sundarshankar89 added feat/profiler Issues related to profilers do-not-merge labels Feb 20, 2026
@ysmx-github ysmx-github force-pushed the feature/redshift-pr3-profiler-cli branch from 84c0bda to ceb5d75 Compare February 24, 2026 08:51
@ysmx-github ysmx-github force-pushed the feature/redshift-pr3-profiler-cli branch from ceb5d75 to 931b42d Compare February 24, 2026 08:59
@ysmx-github ysmx-github force-pushed the feature/redshift-pr3-profiler-cli branch from 7a91a64 to 84c0bda Compare February 24, 2026 09:31
@ysmx-github ysmx-github force-pushed the feature/redshift-pr3-profiler-cli branch from ebd6764 to eb316ac Compare February 24, 2026 21:28
Co-authored-by: Cursor <cursoragent@cursor.com>
- Add step_type 'prepare' in profiler_config (create/drop objects on source)
- Implement _execute_prepare_step in pipeline: one DDL statement per file
- Require connector when pipeline has active prepare steps (profiler.py)
- Tests: add prepare to valid step types, prepare-before-ddl no warning

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge feat/profiler Issues related to profilers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants