Skip to content

Conversation

@Pranaykarvi
Copy link
Contributor

@Pranaykarvi Pranaykarvi commented Dec 31, 2025

What changes were proposed in this pull request?

This PR fixes an interoperability issue between the Gravitino Flink connector and
the native Flink Hive client.

  1. seperate properteis converter interface to catalog properteis converter and schema&table properteis converter
  2. create a hive schema&table properteis converter, to generate gravitino tables according to the table properties and hive conf to follow flink behavior
  3. transform format, serde, input/output format
  4. get serde by following rows:
  // 1. use the serde lib in the stored-as format
  // 2. use the serde lib specified in the properties
  // 3. use the serde lib from default file format
  // 4. use the default serde in hive conf
  // please refer to org.apache.flink.table.catalog.hive.util.HiveTableUtils for more details
  1. get input/output format in following order:
    // 1. use input/output from storage format (STORED AS)
    // 2. use input/output from table properties
    // 3. use input/output from default file format

Why are the changes needed?

Fix: #9508

Does this PR introduce any user-facing change?

no

How was this patch tested?

The patch was tested with both unit and integration tests:

  1. Unit tests

    • Added tests for HivePropertiesConverter to verify:
  2. Integration test

    • Added an end-to-end test that creates a Hive table via the Gravitino Flink
      connector and could be read by native flink client.

@Pranaykarvi
Copy link
Contributor Author

Added Apache Maven repository fallback to address intermittent CI failures
(403 errors from Maven Central when resolving Netty artifacts).
No dependency versions or build semantics changed.

@jerryshao jerryshao requested a review from FANNG1 January 4, 2026 04:04
@FANNG1
Copy link
Contributor

FANNG1 commented Jan 5, 2026

@Pranaykarvi could you fix the CI by ./gradlew :flink-connector:flink:spotlessApply?

@Pranaykarvi
Copy link
Contributor Author

Hi @FANNG1,

I’ve run ./gradlew :flink-connector:flink:spotlessApply, removed the generated gradlew.bat, and rebased the branch.

Could you please take another look when you have a chance? Thanks!

@jerryshao jerryshao requested a review from FANNG1 January 6, 2026 03:24
@Pranaykarvi
Copy link
Contributor Author

Hi @FANNG1,

I’ve updated the implementation to derive the default Hive SerDe via HiveConf using the raw key hive.default.serde, matching Flink Hive connector behavior and avoiding Hive version incompatibilities.

I’ve rebased the branch, rerun Spotless, and verified the unit tests.

Could you please take another look when you have a moment? Thanks!

@Pranaykarvi
Copy link
Contributor Author

Hi @FANNG1,

I’ve applied Spotless formatting cleanly to align with CI (spotlessJavaCheck).
Logic is unchanged; this commit is formatting-only.

Thanks for your patience could you please take another look?

@FANNG1
Copy link
Contributor

FANNG1 commented Jan 7, 2026

Hi @FANNG1,

I’ve applied Spotless formatting cleanly to align with CI (spotlessJavaCheck). Logic is unchanged; this commit is formatting-only.

Thanks for your patience could you please take another look?

After investigating how Flink handles serde-lib in https://github.com/apache/flink/blob/b2a260ac957dac3b6af5dc73684624dd36dc92ea/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/util/HiveTableUtil.java#L502 and https://github.com/apache/flink/blob/b2a260ac957dac3b6af5dc73684624dd36dc92ea/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/util/HiveTableUtil.java#L508
Summarize the logic as follows, please correct me if missing something.

1. get format from hiveConf
2. get serde-lib in folowing order:
    1. get serde-lib from format if format supports serde-lib
    2. get serde-lib from `hive.serde.lib.class.name` in table options
    3. use default serde-lib from hiveConf

and you should reuse the hiveConf in GravitinoHiveCatalog because this hive conf is initialized from the hive conf dir this may need refactor Hive properties converter since we couldn't use a single instance for now , this issue seems far complicated, do you still want to continue this PR or I could continue the PR based on your change.

@Pranaykarvi
Copy link
Contributor Author

Hi @FANNG1,
I’ve applied Spotless formatting cleanly to align with CI (spotlessJavaCheck). Logic is unchanged; this commit is formatting-only.
Thanks for your patience could you please take another look?

After investigating how Flink handles serde-lib in https://github.com/apache/flink/blob/b2a260ac957dac3b6af5dc73684624dd36dc92ea/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/util/HiveTableUtil.java#L502 and https://github.com/apache/flink/blob/b2a260ac957dac3b6af5dc73684624dd36dc92ea/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/util/HiveTableUtil.java#L508 Summarize the logic as follows, please correct me if missing something.

1. get format from hiveConf
2. get serde-lib in folowing order:
    1. get serde-lib from format if format supports serde-lib
    2. get serde-lib from `hive.serde.lib.class.name` in table options
    3. use default serde-lib from hiveConf

and you should reuse the hiveConf in GravitinoHiveCatalog because this hive conf is initialized from the hive conf dir this may need refactor Hive properties converter since we couldn't use a single instance for now , this issue seems far complicated, do you still want to continue this PR or I could continue the PR based on your change.

Hi @FANNG1,

Thanks a lot for the detailed analysis and for pointing out the exact Flink logic this is very helpful.

You’re right that SerDe resolution in Flink involves multiple layers (format, table options, and HiveConf defaults), and reusing the HiveConf from GravitinoHiveCatalog would likely require a broader refactor than this PR originally scoped for.

I’m happy to let you continue the PR based on my changes if that makes it cleaner and easier to align with Flink’s behavior. Please feel free to adjust or refactor HivePropertiesConverter as needed.

Thanks again for taking this forward, and I’m happy to review or help test any follow-up changes.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a Hive SerDe incompatibility issue for tables created via the Gravitino Flink connector and introduces a significant architectural refactoring of properties converters.

Key changes:

  • Splits the PropertiesConverter interface into two separate interfaces: CatalogPropertiesConverter (for catalog-level properties) and SchemaAndTablePropertiesConverter (for schema and table-level properties)
  • Introduces HiveSchemaAndTablePropertiesConverter which implements intelligent SerDe resolution logic, ensuring Hive tables created through Flink default to LazySimpleSerDe when no SerDe is explicitly specified
  • Adds comprehensive unit and integration tests to verify SerDe behavior and Flink ↔ Hive interoperability

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
CatalogPropertiesConverter.java New interface extracted from PropertiesConverter for catalog-level property conversions
SchemaAndTablePropertiesConverter.java New interface extracted from PropertiesConverter for schema and table-level property conversions
HiveCatalogPropertiesConverter.java Renamed from HivePropertiesConverter; implements only CatalogPropertiesConverter
HiveSchemaAndTablePropertiesConverter.java New class implementing SerDe resolution logic for Hive tables with fallback to defaults
GravitinoHiveCatalogFactory.java Updated to create HiveSchemaAndTablePropertiesConverter with HiveConf
GravitinoHiveCatalog.java Updated to use SchemaAndTablePropertiesConverter
BaseCatalog.java Updated to use SchemaAndTablePropertiesConverter for table/schema operations
BaseCatalogFactory.java Updated to use CatalogPropertiesConverter
GravitinoCatalogStore.java Updated to use catalogPropertiesConverter() method
PaimonPropertiesConverter.java Updated to implement both new interfaces
GravitinoPaimonCatalogFactory.java Adds separate methods for catalog and schema/table converters
GravitinoPaimonCatalog.java Updated to use SchemaAndTablePropertiesConverter
IcebergPropertiesConverter.java Updated to implement both new interfaces
GravitinoIcebergCatalogFactory.java Adds separate methods for catalog and schema/table converters
GravitinoIcebergCatalog.java Updated to use SchemaAndTablePropertiesConverter
JdbcPropertiesConverter.java Updated to implement both new interfaces
GravitinoJdbcCatalogFactory.java Defines abstract schemaAndTablePropertiesConverter() method
GravitinoJdbcCatalog.java Updated to use SchemaAndTablePropertiesConverter
GravitinoMysqlJdbcCatalogFactory.java Implements both converter methods
GravitinoPostgresJdbcCatalogFactory.java Implements both converter methods
TestHivePropertiesConverter.java Updated to use HiveCatalogPropertiesConverter
TestHiveSchemaAndTablePropertiesConverter.java New unit tests for SerDe resolution logic
FlinkHiveCatalogIT.java Adds integration tests for SerDe behavior and native Flink Hive catalog interoperability
FlinkHiveKerberosClientIT.java Updated import to use CatalogPropertiesConverter
FlinkEnvIT.java Updated import to use CatalogPropertiesConverter
TestPaimonPropertiesConverter.java Updated import to use CatalogPropertiesConverter
TestBaseCatalog.java Updated mock to use SchemaAndTablePropertiesConverter

@FANNG1 FANNG1 changed the title [#9508] Fix Hive SerDe incompatibility for Flink-created tables [#9508] Fix Hive SerDe incompatibility for Gravitino Flink connector created tables Jan 9, 2026
@FANNG1
Copy link
Contributor

FANNG1 commented Jan 9, 2026

@jerryshao @Pranaykarvi PTAL

@Pranaykarvi
Copy link
Contributor Author

Thanks @FANNG1 for taking this forward and for the refactor to fully align with Flink’s behavior.
Happy to help review or test if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug report] Flink native client couldn't read hive table create by Gravitino Flink connector

2 participants