Skip to content

Conversation

@aaaugustine29
Copy link

Overview:
This change introduces Kafka Connect as a first‑class JMX target system in the JMX metrics library. It adds a ruleset and documentation that cover both Apache Kafka Connect and Confluent Platform variants from the outset, so users can enable Kafka Connect monitoring without custom YAML.

Details:
Added kafka-connect.yaml JMX rules that map worker, rebalance, connector, task, source/sink task, and task-error MBeans into OpenTelemetry metrics, including Apache‑only metrics (e.g., worker rebalance protocol, per‑connector task counts, predicate/transform metadata, converter metadata, source transaction sizes, sink record lag max).
Defined connector and task status as state metrics using the superset of status values across Apache and Confluent, to avoid vendor‑specific enum mismatches.
Documented the new target in kafka-connect.md, including metric groups, attributes, and the dual‑vendor compatibility model (no renames; Apache list as a superset of Confluent docs).
Added self‑contained tests for the Kafka Connect rules that load the YAML, build metric definitions, and validate key state mappings and metric presence, ensuring the new target is ready to consume from day one.

Testing:
./gradlew -Dorg.gradle.configuration-cache.parallel=false instrumentation:jmx-metrics:library:test

@aaaugustine29 aaaugustine29 requested a review from a team as a code owner December 6, 2025 18:08
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Dec 6, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@laurit
Copy link
Contributor

laurit commented Dec 8, 2025

@SylvainJuge could you review this

Copy link
Contributor

@SylvainJuge SylvainJuge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aaaugustine29, thanks for opening this!

There are quite a lot of metrics added here, so it makes it quite challenging to review them all.

I don't have any expertise in Kafka Connect, so you are probably more knowledgeable here.

I would suggest to :

  • implement test with a real instance of the target system, ideally the two apache/confluent variants
  • as a first step, focus on the "essential" metrics, do not include everything that is available, this is where your knowledge might be useful
  • try to simplify the the maximum by using metric attributes to provide breakdown when possible if the metrics represent a partition (for example on state).

@aaaugustine29
Copy link
Author

aaaugustine29 commented Dec 8, 2025

@SylvainJuge Thanks for your help and guidance. At this point, the metrics have been reduced to the minimum set without losing any information. That being said, that doesn't mean we need to keep everything. In particular, your previous comment brings up the opportunity for consolidating some of them with metric attributes. However, there will be a loss of info for a niche and advanced group. What's your guidance on this?

And to clarify your comment about testing, having tests that actually instantiate a kafka connect cluster will be very heavy, I could emulate what the apache jmx server would produce, would that be sufficient?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants