Skip to content

Bug resolve synpase profiler errors#2289

Open
eri-adepoju wants to merge 30 commits intomainfrom
bug_resolve_synpase_profiler_errors
Open

Bug resolve synpase profiler errors#2289
eri-adepoju wants to merge 30 commits intomainfrom
bug_resolve_synpase_profiler_errors

Conversation

@eri-adepoju
Copy link
Contributor

@eri-adepoju eri-adepoju commented Feb 12, 2026

Changes

What does this PR do?

  1. Pipeline and trigger run extraction – Updates handling of list_pipeline_runs and list_trigger_runs so they correctly process batched yields (lists of dicts) instead of treating each yield as a single run.
  2. Serverless SQL pool routines – Adds list_serverless_routines() using sys.objects because information_schema.routines is not available in serverless pools.
  3. Server-level DMVs – Reconnects to the master database before querying server-level DMVs (e.g., data_processed), since these views must be queried from master.
  4. Whitespace in credentials – Strips leading/trailing whitespace from credentials and config values (user, password, server, database, driver, auth_type, tz_info, development_endpoint) to avoid connection failures from copy-paste or config issues.
  5. DataFrame concatenation – Replaces deprecated DataFrame.union() with pd.concat() in monitoring_metrics_extract.py.
  6. Documentation – Clarifies Azure auth (DefaultAzureCredential order), local vs CI setup, DMV permissions (VIEW DATABASE STATE, VIEW SERVER STATE, VIEW DEFINITION), and serverless pool catalog views.

Relevant implementation details

  • serverless_sqlpool_extract.py: Uses get_sqlpool_reader(config, 'master', ...) before querying server-level DMVs; routines query switched from list_routines to list_serverless_routines.
  • database_manager.py: All credential/config string fields use .strip() before connection
  • monitoring_metrics_extract.py: step_name for spark pool metrics moved outside the loop, so it is defined even when the loop is empty.

Linked issues

Resolves #2287

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs lakebridge ...
  • fixed existing functionality

Tests

  • manually tested
  • added unit tests
  • added integration tests

eri-adepoju and others added 18 commits January 13, 2026 14:30
…`list_trigger_runs` rather than expecting one dict per run.
…ames.

Add unit tests to verify that whitespace in credential fields and batch processing of pipeline and trigger runs are is correctly handled.

Replace union with concat for Pandas Dataframes.
@eri-adepoju eri-adepoju requested a review from a team as a code owner February 12, 2026 21:44
@codecov
Copy link

codecov bot commented Feb 12, 2026

Codecov Report

❌ Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.41%. Comparing base (6bd912f) to head (ee74132).

Files with missing lines Patch % Lines
...ge/resources/assessments/synapse/common/queries.py 0.00% 3 Missing ⚠️
.../assessments/synapse/serverless_sqlpool_extract.py 0.00% 2 Missing ⚠️
...resources/assessments/synapse/workspace_extract.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2289      +/-   ##
==========================================
- Coverage   66.44%   66.41%   -0.03%     
==========================================
  Files          99       99              
  Lines        9089     9093       +4     
  Branches      974      974              
==========================================
  Hits         6039     6039              
- Misses       2874     2878       +4     
  Partials      176      176              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Feb 12, 2026

✅ 143/143 passed, 5 flaky, 5 skipped, 34m16s total

Flaky tests:

  • 🤪 test_installs_and_runs_pypi_bladebridge (26.645s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (16.774s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[True] (16.943s)
  • 🤪 test_transpile_teradata_sql (6.998s)
  • 🤪 test_transpile_teradata_sql_non_interactive[False] (5.811s)

Running from acceptance #3944

Copy link
Collaborator

@sundarshankar89 sundarshankar89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @eri-adepoju for identifying gaps in the extraction, and for the authentication related changes I will use different approach, I can PR that sperately.

```

The profiler uses Azure SDK's `DefaultAzureCredential` which attempts authentication in this order:
1. **Environment Variables** (Service Principal):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, Thanks for this document update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

def test_zoneinfo_creation_with_stripped_whitespace() -> None:
"""Test that zoneinfo.ZoneInfo works correctly with stripped timezone strings."""
# This tests the core behavior that our code relies on
tz_with_whitespace = ' America/New_York '
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see what has happened, I will tackle this differently for now you cna remove .strip() I will ensure the .credentials.yml doesn't have any spaces likes these when stored.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think these tests are vaild since we removed strip

from datetime import date


def test_pipeline_runs_handles_batches_correctly():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding tests, I m adding type hints in this PR, having type hints enabled and then having tests will help our case.

#2264


- Profiler uses the Python version of Azure SDK libraries to extract information about target Synapse Workspace.
- For making the Azure API calls using Azure SDK you need an Azure Service Principal with the following role assignments.
- For making the Azure API calls using Azure SDK, the authenticated identity (user or service principal) needs the following role assignments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - let's remove mention of service principal until support is added in a future PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is separate from a service principal accessing information schema tables. Service principals can access Synapse workspaces and Azure monitor metrics with the profiler today.

"""

@staticmethod
def list_serverless_routines(pool_name, redact_sql_text: bool = False) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@sundarshankar89
Copy link
Collaborator

@eri-adepoju can you fix fmt errors.

@sundarshankar89
Copy link
Collaborator

@eri-adepoju there is small conflict can resolve those and make it ready for review changes look good to me.

@eri-adepoju
Copy link
Contributor Author

@eri-adepoju there is small conflict can resolve those and make it ready for review changes look good to me.

All done!

Copy link
Collaborator

@gueniai gueniai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@goodwillpunning goodwillpunning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@gueniai gueniai enabled auto-merge February 26, 2026 00:53
@gueniai gueniai dismissed sundarshankar89’s stale review February 26, 2026 00:53

Sundar approved changes in a previous comment.

@gueniai gueniai added this pull request to the merge queue Feb 26, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Synpase Profiler errors across multiple steps

4 participants