Skip to content

Core: HadoopFileIO to take list of filesystem schemas to enable trash for #15093

@steveloughran

Description

@steveloughran

Feature Request / Improvement

Core: HadoopFileIO to take list of filesystem schemas to enable trash for

#14501 uses trash policy when the target path resolves to local or hdfs, which it does by looking at the classname of the FS instance.
That

  • Breaks distributions which don't have hdfs on the classpath (for example Azure HD/Insight)
  • Adds the overhead of instantiating Trash policies on every single delete
  • Doesn't allow trash to be applied to other filesystems, or disapplied to localfs (which doesn't have any remote cleaner in the background).

Proposed:

  • add option iceberg.hadoop.trash.schemas to take a list of filesystems, (defaults "hdfs" and "viewfs") and only apply if there's a match
  • add test which makes file the schema, verifies it can be enabled/disabled.
  • document. (where?). Maybe also update SupportsBulkOperations javadocs to mention deletion may be replaced by trash.

Also, need to restore semantics "deletePath(missing) doesn't raise an exception".

Query engine

None

Willingness to contribute

  • I can contribute this improvement/feature independently
  • I would be willing to contribute this improvement/feature with guidance from the Iceberg community
  • I cannot contribute this improvement/feature at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    improvementPR that improves existing functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions