Skip to content

Conversation

@Baunsgaard
Copy link

What

Fix RewriteTablePathUtil.relativize() to handle the edge case where path equals prefix exactly.

Why

Currently, relativize() fails when a path equals the prefix (e.g., write.data.path set to the table root). This breaks rewrite_table_path for tables with properties pointing to the table location itself.

Example failure:
// Throws IllegalArgumentException instead of returning ""
RewriteTablePathUtil.relativize("/path/to/table", "/path/to/table");

Use case:
Storage migration or replication where write.data.path = table location.

How

Added a check for exact match after normalizing trailing separators:

// Handle exact match where path equals prefix (without trailing separator)
if (maybeAppendFileSeparator(path).equals(toRemove)) {
  return "";
}

Changes

  • RewriteTablePathUtil.java: Fix relativize() + updated Javadoc for relativize() and newPath()
  • TestRewriteTablePathUtil.java: Added 10 test methods covering:
    • Normal relativization
    • Path equals prefix (the fix)
    • Trailing separator variations
    • Invalid path rejection
    • Subdirectory scenarios (backup/restore)
    • Overlapping name rejection (/table vs /table-old)

Testing

./gradlew :iceberg-core:test --tests "org.apache.iceberg.TestRewriteTablePathUtil"

@github-actions github-actions bot added the core label Jan 28, 2026
Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, discussed offline with @Baunsgaard , I believe this was an initial oversight of the code

assertThat(RewriteTablePathUtil.relativize("/source/table/", "/source/table")).isEqualTo("");

// Edge case: prefix has trailing separator
assertThat(RewriteTablePathUtil.relativize("/a", "/a/")).isEqualTo("");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this pass? I guess /a cannot be relative to the given prefix of /a/, right? Unless we want to normalize file separator on both side

I think those 2 are ok as of now

    assertThat(RewriteTablePathUtil.relativize("/a/", "/a")).isEqualTo("");
    assertThat(RewriteTablePathUtil.relativize("/a/", "/a/")).isEqualTo("");

Copy link
Author

@Baunsgaard Baunsgaard Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected that case to work. I added explicit handling for it, by modifying your suggested code to do the maybe append file separator :
if (!path.startsWith(toRemove) && !maybeAppendFileSeparator(path).equals(toRemove)) {

Fix RewriteTablePathUtil.relativize() throwing IllegalArgumentException
when path equals prefix exactly, breaking rewrite_table_path when table
properties like write.data.path point to the table root.

The fix normalizes trailing separators for both path and prefix, treating
'/a' and '/a/' as equivalent. This handles real-world cases where paths
from different sources may have inconsistent trailing separators.

Changes:
- Fix relativize() to accept path equal to prefix
- Normalize trailing separators on both path and prefix
- Optimize combinePaths() for empty relative paths
- Add comprehensive tests for all trailing separator combinations
@Baunsgaard
Copy link
Author

Thanks for the review feedback @szehon-ho and @dramaticlly!

I have updated the implementation, though with a slight variation from the suggested approach.

What was suggested

if (!path.startsWith(toRemove) && !path.equals(prefix)) {
  throw ...
}

What I implemented

if (!path.startsWith(toRemove) && !maybeAppendFileSeparator(path).equals(toRemove)) {
  throw ...
}

Why the difference

The suggested approach using !path.equals(prefix) was stricter - it would reject relativize("/a", "/a/") because "/a" != "/a/".

I chose to normalize both sides because:

  • /a and /a/ represent the same directory
  • Table properties may come from different sources with inconsistent trailing separators
  • Being strict about this mismatch seemed overly pedantic for a utility method

This means all trailing separator combinations now work:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants