Skip to content

Comments

[fix][broker] Use compatible Avro name validator to allow '$' in schema record names#25193

Open
mattisonchao wants to merge 7 commits intomasterfrom
fixes.compatible.avro
Open

[fix][broker] Use compatible Avro name validator to allow '$' in schema record names#25193
mattisonchao wants to merge 7 commits intomasterfrom
fixes.compatible.avro

Conversation

@mattisonchao
Copy link
Member

@mattisonchao mattisonchao commented Jan 29, 2026

Motivation

After #24617 upgraded Avro to a newer version, the default Schema.Parser()
uses UTF_VALIDATOR which rejects the $ character in record names. This
breaks Protobuf schemas whose Avro representation contains $ in generated
nested type names (e.g. inner classes).

Modifications

  • Introduce a CompatibleNameValidator that allows $ in addition to letters,
    digits, and underscores, matching the previous Avro behavior.
  • Apply the custom validator to all broker-side Schema.Parser() instances
    that handle Protobuf schemas:
    • StructSchemaDataValidator
    • SchemaRegistryServiceImpl
    • AvroSchemaBasedCompatibilityCheck
  • Add DataRecord.proto for test reproduction.
  • Add unit tests for the CompatibleNameValidator and Protobuf schema
    compatibility.

Note: The client-side SchemaUtil already uses NameValidator.NO_VALIDATION.
A shared solution across broker and client (e.g. factory in pulsar-common)
can be addressed in a follow-up.

Verifying this change

  • Unit tests added for CompatibleNameValidator (valid names, invalid names,
    edge cases, error messages).
  • Integration test with ProtobufSchema<DataRecord> to reproduce the original issue.
  • CI checks pass.

Documentation

  • doc-not-needed

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a regression introduced by the Avro upgrade (1.12.0) where Protobuf-derived Avro schemas can fail parsing due to $ appearing in generated record names, by introducing a more permissive Avro NameValidator and adding tests to cover the scenario.

Changes:

  • Add a custom Avro NameValidator (CompatibleNameValidator) to allow $ in Avro record/field names during schema validation.
  • Add unit tests for the validator’s behavior and a reproduction test using a generated Protobuf schema.
  • Introduce a new Protobuf message (DataRecord.proto) used by the tests.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/validator/StructSchemaDataValidator.java Uses a custom Avro NameValidator to accept $ during schema parsing/validation.
pulsar-broker/src/test/java/org/apache/pulsar/broker/service/schema/validator/SchemaDataValidatorTest.java Adds tests for the new validator and a Protobuf-based reproduction.
pulsar-broker/src/main/proto/DataRecord.proto Adds a Protobuf schema used to generate Avro with nested-type $ names for testing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mattisonchao mattisonchao marked this pull request as ready for review February 20, 2026 01:53
@mattisonchao mattisonchao reopened this Feb 20, 2026
@mattisonchao mattisonchao changed the title [fix][schema] Illegal character '$' in record [fix][broker] Use compatible Avro name validator to allow '$' in schema record names Feb 20, 2026
@mattisonchao mattisonchao requested a review from lhotari February 20, 2026 13:11
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this file for tests? I think it should be in pulsar-broker/src/test/proto directory in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants