Skip to content

Feature: Enforce Operator Pods Scheduling on Separate Nodes for High Availability #209

@jetroberts

Description

@jetroberts

Feature Request

Summary:
As a user of Typesense Operator, I want the ability to enforce scheduling of operator pods (when running multiple replicas for high availability) on different Kubernetes nodes, so there's resilience to node failures and true HA is achieved.

Background:
Currently, multiple typesense-operator replica pods may be scheduled by Kubernetes on the same node, unless a user explicitly defines affinity or anti-affinity. This can lead to both operator pods being unavailable if the node fails, undermining fault tolerance.

Requested Behavior:

  • Provide an easy/standard mechanism to ensure replicas of the operator are always scheduled on separate nodes.
  • This could be done via:
    • Helm chart support for configurable anti-affinity in values.yaml, e.g.:
      controllerManager:
        replicas: 2
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution: 
            - labelSelector:
                matchExpressions:
                  - key: control-plane
                    operator: In
                    values: [controller-manager]
              topologyKey: kubernetes.io/hostname
    • Or, sensible defaults out-of-the-box (in the chart) when replicas > 1.
    • Documentation/example for YAML/kustomize installations on how to configure anti-affinity for operator pods.

Benefits:

  • Ensures that operator HA deployments withstand individual node failures.
  • Provides a safer default for production use cases.
  • Reduces ops burden for users who may not be familiar with affinity/anti-affinity settings.

Acceptance Criteria:

  • Ability to configure pod anti-affinity (and optionally node affinity) via Helm values
  • Example documented for manifest-based installations
  • Helm chart has sensible HA defaults or warning when deploying multiple replicas without node redundancy

References:


Requested by user to improve HA of operator deployments and resilience to node failure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions