Redshift Profiler PR3: CLI, configs and tests#2306
Open
ysmx-github wants to merge 4 commits intomainfrom
Open
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2306 +/- ##
==========================================
- Coverage 66.44% 65.94% -0.51%
==========================================
Files 99 99
Lines 9090 9181 +91
Branches 974 988 +14
==========================================
+ Hits 6040 6054 +14
- Misses 2874 2952 +78
+ Partials 176 175 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
✅ 155/155 passed, 7 flaky, 5 skipped, 29m49s total Flaky tests:
Running from acceptance #3941 |
84c0bda to
ceb5d75
Compare
ceb5d75 to
931b42d
Compare
7a91a64 to
84c0bda
Compare
ebd6764 to
eb316ac
Compare
Co-authored-by: Cursor <cursoragent@cursor.com>
eb316ac to
c64775c
Compare
- Add step_type 'prepare' in profiler_config (create/drop objects on source) - Implement _execute_prepare_step in pipeline: one DDL statement per file - Require connector when pipeline has active prepare steps (profiler.py) - Tests: add prepare to valid step types, prepare-before-ddl no warning Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
This PR adds Redshift as a supported profiler assessment platform. It wires Redshift into the CLI and assessment config, and parameterizes profiler/validator tests for both synapse and Redshift using mocks (no live cluster).
What does this PR do?
Relevant implementation details
Caveats/things to watch out for when reviewing:
Linked issues
Resolves #..
Functionality
databricks labs lakebridge ...Tests
Added Redshift table definitions: query_view, rs_managed_storage_gb, rs_nodes (same pattern as Synapse: 3 tables, 1 empty).
Added RedshiftProfilerBuilder and build_mock_redshift_extract() so validator tests can use a Redshift-shaped DuckDB without a real cluster.
Minimal schema for validator tests: query_view, rs_managed_storage_gb, rs_nodes.
rs_managed_storage_gb.rs_managed_storage_gb is defined as VARCHAR (mock has DOUBLE) so test_validate_invalid_schema_check still fails as intended.
Added session-scoped mock_redshift_profiler_extract and PLATFORM_VALIDATOR_CONFIG for platform-specific table names and counts.
All 7 validator tests are parameterized with @pytest.mark.parametrize("platform", ["synapse", "redshift"]).
Each test runs once per platform; logic is shared, no new test cases.
test_get_profiler_extract_path is unchanged (no platform param).
All 5 tests parameterized with @pytest.mark.parametrize("platform", ["synapse", "redshift"]).
Execution tests use script-based configs only (no live DB):
Synapse: pipeline_config_main.yml + db_extract.py.
Redshift: pipeline_config_main_redshift.yml + db_extract_redshift.py.
test_profile_execution_config_override copies the platform-specific script (db_extract.py or db_extract_redshift.py) into a temp dir and runs it.
tests/resources/assessments/db_extract_redshift.py: standalone script that creates a DuckDB at --db-path with the same 3 Redshift tables (no imports from tests).
tests/resources/assessments/pipeline_config_main_redshift.yml: pipeline config that runs that script.