Skip to content

Conversation

@IceS2
Copy link
Contributor

@IceS2 IceS2 commented Jan 8, 2026

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

  • Fixed AWS role assumption:
    • Corrected credential handling in parquet.py S3FileSystem initialization for role assumption scenarios
  • Credential strategy separation:
    • Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN) now used when assumeRoleArn is present to enable PyArrow's automatic credential refresh
  • Direct credentials support:
    • Non-role scenarios now pass credentials directly via access_key/secret_key parameters

This will update automatically on new commits.


Comment on lines +217 to +227

if self.config_source.securityConfig.awsAccessKeyId:
os.environ[
"AWS_ACCESS_KEY_ID"
] = self.config_source.securityConfig.awsAccessKeyId
os.environ[
"AWS_SECRET_ACCESS_KEY"
] = (
self.config_source.securityConfig.awsSecretAccessKey.get_secret_value()
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Security: AWS credentials persist in environment after use

Details

When assumeRoleArn is configured with explicit credentials, the code sets AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and potentially AWS_SESSION_TOKEN as environment variables. These credentials remain in the process environment indefinitely after the S3FileSystem is created, which poses several risks:

  1. Credential leakage: Other code running in the same process can read these credentials
  2. Stale credentials: Previous credentials may persist and interfere with subsequent operations using different credential configurations
  3. Security hygiene: Sensitive credentials should have minimal lifetime in memory/environment

Suggested fix: Clear the environment variables after S3FileSystem initialization, or use a context manager pattern:

try:
    os.environ["AWS_ACCESS_KEY_ID"] = ...
    os.environ["AWS_SECRET_ACCESS_KEY"] = ...
    if self.config_source.securityConfig.awsSessionToken:
        os.environ["AWS_SESSION_TOKEN"] = ...
    s3_fs = S3FileSystem(**client_kwargs)
finally:
    os.environ.pop("AWS_ACCESS_KEY_ID", None)
    os.environ.pop("AWS_SECRET_ACCESS_KEY", None)
    os.environ.pop("AWS_SESSION_TOKEN", None)

] = self.config_source.securityConfig.awsAccessKeyId
os.environ[
"AWS_SECRET_ACCESS_KEY"
] = (
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: Missing null check before accessing awsSecretAccessKey

Details

On line 244, when assumeRoleArn and awsAccessKeyId are present, the code calls self.config_source.securityConfig.awsSecretAccessKey.get_secret_value() without first checking if awsSecretAccessKey is not None. If awsAccessKeyId is provided but awsSecretAccessKey is missing or None, this will raise an AttributeError.

The same issue exists on line 256 in the elif branch.

Suggested fix: Add a null check before accessing:

if self.config_source.securityConfig.awsAccessKeyId and self.config_source.securityConfig.awsSecretAccessKey:
    os.environ["AWS_ACCESS_KEY_ID"] = self.config_source.securityConfig.awsAccessKeyId
    os.environ["AWS_SECRET_ACCESS_KEY"] = self.config_source.securityConfig.awsSecretAccessKey.get_secret_value()

@github-actions
Copy link
Contributor

github-actions bot commented Jan 8, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.12)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (4)

Package Vulnerability ID Severity Installed Version Fixed Version
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

@github-actions
Copy link
Contributor

github-actions bot commented Jan 8, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.12)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (9)

Package Vulnerability ID Severity Installed Version Fixed Version
Werkzeug CVE-2024-34069 🚨 HIGH 2.2.3 3.0.3
aiohttp CVE-2025-69223 🚨 HIGH 3.12.12 3.13.3
aiohttp CVE-2025-69223 🚨 HIGH 3.13.2 3.13.3
deepdiff CVE-2025-58367 🔥 CRITICAL 7.0.1 8.6.1
ray CVE-2025-62593 🔥 CRITICAL 2.47.1 2.52.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

No Vulnerabilities Found

@gitar-bot
Copy link

gitar-bot bot commented Jan 9, 2026

🔍 CI failure analysis for 4e4e818: Playwright E2E test failures with timeouts and element not found errors, all unrelated to this PR's backend AWS credential changes.

Issue

The playwright-ci-postgresql (3, 6) job failed with multiple frontend E2E test failures.

Root Cause

Multiple Playwright tests failed with various errors:

  1. Timeout errors: page.goto: Timeout 60000ms exceeded and Test timeout of 180000ms exceeded

    • Tests timing out while navigating to pages or waiting for elements
    • Examples: CustomizeWidgets.spec.ts, ExploreDiscovery.spec.ts
  2. Browser/page closed errors: Target page, context or browser has been closed

    • Tests encountering closed browser contexts mid-execution
    • Examples: SettingsNavigationPage.spec.ts
  3. Undefined property errors: Cannot read properties of undefined (reading 'name')

    • Tests accessing undefined objects, indicating data setup issues
    • Examples: ExploreDiscovery.spec.ts:216
  4. Element not found errors: Tests unable to find expected UI elements

    • Examples: TestSuiteMultiPipeline.spec.ts waiting for add-pipeline-button

Failing test files:

  • SettingsNavigationPage.spec.ts
  • TestSuiteMultiPipeline.spec.ts
  • CustomizeWidgets.spec.ts
  • ExploreDiscovery.spec.ts

Details

This PR only modifies:

  • ingestion/src/metadata/readers/dataframe/parquet.py (Python backend code for AWS S3 Parquet reading)

No changes to:

  • Frontend code, UI components, or test files
  • Any code that could affect the OpenMetadata web interface

These Playwright E2E test failures are unrelated to this PR's changes. The PR only touches backend Python code for AWS credential handling in Parquet file reading, while the failures are in frontend E2E tests that test UI functionality.

The test failures appear to be environmental issues or pre-existing frontend test stability problems (timeouts, race conditions, browser context issues).

Code Review ⚠️ Changes requested

The credential handling refactor for AWS role assumption separates the two scenarios well, but still has the previously identified issues: credentials persist in environment variables after use and there's a potential null dereference on awsSecretAccessKey.

⚠️ Security: AWS credentials persist in environment after use

📄 ingestion/src/metadata/readers/dataframe/parquet.py:228

When assumeRoleArn is configured with explicit credentials, the code sets AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and potentially AWS_SESSION_TOKEN as environment variables. These credentials remain in the process environment indefinitely after the S3FileSystem is created, which poses several risks:

  1. Credential leakage: Other code running in the same process can read these credentials
  2. Stale credentials: Previous credentials may persist and interfere with subsequent operations using different credential configurations
  3. Security hygiene: Sensitive credentials should have minimal lifetime in memory/environment

Suggested fix: Clear the environment variables after S3FileSystem initialization, or use a context manager pattern:

try:
    os.environ["AWS_ACCESS_KEY_ID"] = ...
    os.environ["AWS_SECRET_ACCESS_KEY"] = ...
    if self.config_source.securityConfig.awsSessionToken:
        os.environ["AWS_SESSION_TOKEN"] = ...
    s3_fs = S3FileSystem(**client_kwargs)
finally:
    os.environ.pop("AWS_ACCESS_KEY_ID", None)
    os.environ.pop("AWS_SECRET_ACCESS_KEY", None)
    os.environ.pop("AWS_SESSION_TOKEN", None)
⚠️ Bug: Missing null check before accessing awsSecretAccessKey

📄 ingestion/src/metadata/readers/dataframe/parquet.py:243

On line 244, when assumeRoleArn and awsAccessKeyId are present, the code calls self.config_source.securityConfig.awsSecretAccessKey.get_secret_value() without first checking if awsSecretAccessKey is not None. If awsAccessKeyId is provided but awsSecretAccessKey is missing or None, this will raise an AttributeError.

The same issue exists on line 256 in the elif branch.

Suggested fix: Add a null check before accessing:

if self.config_source.securityConfig.awsAccessKeyId and self.config_source.securityConfig.awsSecretAccessKey:
    os.environ["AWS_ACCESS_KEY_ID"] = self.config_source.securityConfig.awsAccessKeyId
    os.environ["AWS_SECRET_ACCESS_KEY"] = self.config_source.securityConfig.awsSecretAccessKey.get_secret_value()

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off Gitar will not commit updates to this branch.
Display: compact Hiding non-applicable rules.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | This comment will update automatically (Docs)

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 9, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants