Skip to content

Conversation

@major
Copy link
Contributor

@major major commented Jan 23, 2026

Description

Integrates Splunk HEC telemetry into the rlsapi v1 /infer endpoint. This is the final PR in the Splunk integration series (building on #1031 and #1032).

Changes:

  • Add _get_rh_identity_context() helper to extract org_id/system_id from request.state
  • Add _queue_splunk_event() to build and queue telemetry events via FastAPI BackgroundTasks
  • Add timing measurement around inference calls
  • Queue infer_with_llm events on success, infer_error events on failure
  • Add user-facing documentation (docs/splunk.md)
  • Add developer documentation (src/observability/README.md)

Events are sent asynchronously and never block or affect the main request flow.

Type of change

  • New feature
  • Documentation Update
  • Unit tests improvement
  • Integration tests improvement

Tools used to create PR

  • Assisted-by: Claude (Anthropic)
  • Generated by: N/A

Related Tickets & Documents

  • Related Issue # RSPEED-2326
  • Closes # N/A (part of multi-PR implementation)

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  1. Unit tests: uv run pytest tests/unit/app/endpoints/test_rlsapi_v1.py -v (24 tests)
  2. Integration tests: uv run pytest tests/integration/endpoints/test_rlsapi_v1_integration.py -v (11 tests)
  3. Observability tests: uv run pytest tests/unit/observability/ -v (11 tests)

New tests verify:

  • RH Identity context extraction (with/without identity data, empty values)
  • Splunk event queuing on successful inference
  • Splunk error event queuing on failed inference
  • Event payload includes correct org_id/system_id from RH Identity

Summary by CodeRabbit

  • New Features

    • Added Splunk HEC integration for asynchronous inference telemetry emission with configurable endpoints, authentication, and graceful degradation.
    • Enhanced telemetry tracking with RH Identity context integration.
  • Documentation

    • Added comprehensive Splunk HEC configuration guide covering setup, event formats, and troubleshooting.
    • Added observability module documentation.

✏️ Tip: You can customize this high-level summary in your review settings.

…endpoint

- Add _get_rh_identity_context() to extract org_id/system_id from request.state
- Add _queue_splunk_event() to build and queue telemetry events via BackgroundTasks
- Add timing measurement around inference calls
- Queue infer_with_llm events on success, infer_error on failure
- Add unit tests for RH Identity context extraction and Splunk integration
- Update integration tests for new endpoint signature
- Add user-facing docs (docs/splunk.md) and developer docs (src/observability/README.md)

Signed-off-by: Major Hayden <major@redhat.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 23, 2026

Walkthrough

Integrates Splunk HEC telemetry emission into Lightspeed Core Stack by extending the inference endpoint to accept Request and BackgroundTasks parameters, extracting RH Identity context (org_id, system_id), and queuing asynchronous telemetry events on both success and error paths. Includes comprehensive configuration documentation, observability module guidance, and updated test coverage.

Changes

Cohort / File(s) Summary
Documentation
docs/splunk.md, src/observability/README.md
Added comprehensive Splunk HEC integration guide covering configuration fields (enabled, url, token_path, index, source, timeout, verify_ssl), event formats, endpoint mappings, and troubleshooting steps. Observability module documentation introduces architecture and graceful degradation patterns.
Telemetry & RH Identity Integration
src/app/endpoints/rlsapi_v1.py
Extended infer_endpoint signature with request: Request and background_tasks: BackgroundTasks parameters. Added _get_rh_identity_context helper to extract org_id/system_id from request state with fallback to AUTH_DISABLED. Introduced _queue_splunk_event to dispatch telemetry events asynchronously on success and error paths (APIConnectionError, RateLimitError, APIStatusError). Added inference timing instrumentation via time.monotonic().
Integration Tests
tests/integration/endpoints/test_rlsapi_v1_integration.py
Added helper functions _create_mock_request and _create_mock_background_tasks and corresponding pytest fixtures. Updated all integration tests to pass mock Request and BackgroundTasks objects to infer_endpoint calls.
Unit Tests
tests/unit/app/endpoints/test_rlsapi_v1.py
Added test helpers for mock Request/BackgroundTasks construction. Introduced 3 new tests for _get_rh_identity_context (with RH Identity, without RH Identity, with empty values). Refactored existing endpoint tests to use mocks. Added 2 new tests for Splunk event queuing behavior on success and failure. Expanded test coverage to verify RH Identity context inclusion in telemetry events.

Sequence Diagram

sequenceDiagram
    participant Client
    participant infer_endpoint
    participant RHIdentity as RH Identity<br/>Context
    participant Processing as Inference<br/>Processing
    participant BackgroundTasks
    participant SplunkHEC

    Client->>infer_endpoint: POST /rlsapi/v1/infer<br/>(request, background_tasks)
    infer_endpoint->>RHIdentity: extract org_id, system_id
    RHIdentity-->>infer_endpoint: context or AUTH_DISABLED
    infer_endpoint->>infer_endpoint: start inference timer
    infer_endpoint->>Processing: execute inference
    alt Success
        Processing-->>infer_endpoint: response
        infer_endpoint->>infer_endpoint: calculate inference_time
        infer_endpoint->>BackgroundTasks: add_task(_queue_splunk_event)<br/>(success payload + RH context)
    else Error (API/RateLimit/Status)
        Processing-->>infer_endpoint: exception
        infer_endpoint->>infer_endpoint: calculate inference_time
        infer_endpoint->>BackgroundTasks: add_task(_queue_splunk_event)<br/>(error payload + RH context)
    end
    infer_endpoint-->>Client: response/exception
    BackgroundTasks->>SplunkHEC: async: send telemetry event
    SplunkHEC-->>BackgroundTasks: acknowledged
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested labels

ok-to-test

Suggested reviewers

  • tisnik
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: integrating Splunk telemetry into the v1 /infer endpoint, which is the primary objective of this PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant