Skip to content

Fix ozone getting started example#3574

Merged
MonkeyCanCode merged 2 commits intoapache:mainfrom
MonkeyCanCode:fix_ozone_getting_started_example
Jan 27, 2026
Merged

Fix ozone getting started example#3574
MonkeyCanCode merged 2 commits intoapache:mainfrom
MonkeyCanCode:fix_ozone_getting_started_example

Conversation

@MonkeyCanCode
Copy link
Contributor

@MonkeyCanCode MonkeyCanCode commented Jan 27, 2026

While validating other things, I noticed our ozone getting started example appears to be non-functional. The main breaking pieces are:

  1. Ozone cluster stuck in safe mode (this is due to in 2.1.0, OZone has hdds.scm.safemode.min.datanode set to 3 (https://ozone.apache.org/docs/current/concept/storagecontainermanager.html) which it will needs 3 data nodes before leaving safe mode while we only have one datanode in the docker-compose). Sample error below:
ozone-s3g-1       | SCM is in safe mode. Will retry in 1000ms
ozone-s3g-1       | SCM is in safe mode. Will retry in 1000ms
  1. While performing end to end validation, I noticed AWS S3 auth are needed for Polaris and Spark even when securing S3 is not enabled (when it is not enabled, it will take any value as long as provided...ref: https://ozone.apache.org/docs/edge/security/securings3.html. I checked on ozone 2.0.0 as well and the same behavior is there). Sample error below:
Caused by: java.io.UncheckedIOException: Failed to close current writer
	at org.apache.iceberg.io.RollingFileWriter.closeCurrentWriter(RollingFileWriter.java:124)
	at org.apache.iceberg.io.RollingFileWriter.close(RollingFileWriter.java:147)
	at org.apache.iceberg.io.RollingDataWriter.close(RollingDataWriter.java:32)
	at org.apache.iceberg.spark.source.SparkWrite$UnpartitionedDataWriter.close(SparkWrite.java:747)
	at org.apache.iceberg.spark.source.SparkWrite$UnpartitionedDataWriter.commit(SparkWrite.java:729)
	at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$5(WriteToDataSourceV2Exec.scala:475)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1397)
	at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteToDataSourceV2Exec.scala:491)
	at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteToDataSourceV2Exec.scala:430)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:496)
	at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:393)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:621)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:624)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.io.IOException: software.amazon.awssdk.core.exception.SdkClientException: Unable to load credentials from any of the providers in the chain AwsCredentialsProviderChain(credentialsProviders=[SystemPropertyCredentialsProvider(), EnvironmentVariableCredentialsProvider(), WebIdentityTokenCredentialsProvider(), ProfileCredentialsProvider(profileName=default, profileFile=ProfileFile(sections=[])), ContainerCredentialsProvider(), InstanceProfileCredentialsProvider()]) : [SystemPropertyCredentialsProvider(): Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId)., EnvironmentVariableCredentialsProvider(): Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId)., WebIdentityTokenCredentialsProvider(): Either the environment variable AWS_WEB_IDENTITY_TOKEN_FILE or the javaproperty aws.webIdentityTokenFile must be set., ProfileCredentialsProvider(profileName=default, profileFile=ProfileFile(sections=[])): Profile file contained no credentials for profile 'default': ProfileFile(sections=[]), ContainerCredentialsProvider(): Cannot fetch credentials from container - neither AWS_CONTAINER_CREDENTIALS_FULL_URI or AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variables are set., InstanceProfileCredentialsProvider(): Failed to load credentials from IMDS.]
	at org.apache.iceberg.shaded.org.apache.parquet.io.DelegatingPositionOutputStream.close(DelegatingPositionOutputStream.java:41)
	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileWriter.close(ParquetFileWriter.java:1673)
	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:1659)
	at org.apache.iceberg.parquet.ParquetWriter.close(ParquetWriter.java:261)
	at org.apache.iceberg.io.DataWriter.close(DataWriter.java:82)
	at org.apache.iceberg.io.RollingFileWriter.closeCurrentWriter(RollingFileWriter.java:122)
	... 21 more

Beside these two, this PR also addressed the following NITs:

  1. Use valid and consistent S3 region (changed to us-west-2)
  2. Use auth without mentioned of minio (changed to polaris_root:polaris_pass)

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Jan 27, 2026
@jbonofre
Copy link
Member

It looks good to me. I didn't test locally, but the changes make sense.

@MonkeyCanCode
Copy link
Contributor Author

It looks good to me. I didn't test locally, but the changes make sense.

Thanks for the quick review.

@snazy
Copy link
Member

snazy commented Jan 27, 2026

Changes LGTM.
I found similar issues while working on #3553

@MonkeyCanCode
Copy link
Contributor Author

Changes LGTM. I found similar issues while working on #3553

Thanks for review @snazy. Yeah, I was doing similar prototyping yesterday then found this issue. But I think your PR is more comprehensive then the prototyping I have locally yesterday. Merge this one now.

@MonkeyCanCode MonkeyCanCode merged commit cac2650 into apache:main Jan 27, 2026
15 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants