Skip to content

Conversation

@philbrookes
Copy link
Contributor

@philbrookes philbrookes commented Dec 9, 2025

This PR introduces DNS Groups functionality to enable active-passive failover for DNS records across multiple clusters. Groups allow operators to control which cluster's DNS records are published based on a configurable "active groups" list, enabling scenarios like:

  • Multi-cluster disaster recovery with controlled failover
  • Geographic failover between regions
  • Blue-green deployments at the DNS level
  • Staged rollouts with DNS-based traffic shifting

Changes

API Changes:

  • Added ActiveGroups field to DNSRecordStatus to track the current list of active groups
  • Added IsActive() method to DNSRecord type (interface method for group-aware behavior)
  • New condition type ConditionTypeActive with reasons:
    • ConditionReasonInActiveGroup - Record's group is active and will be published
    • ConditionReasonNotInActiveGroup - Record's group is inactive and won't be published

Core Implementation (internal/controller/dnsrecord_groups.go - new file):

  • TXTResolver interface for querying active groups from DNS
  • DefaultTXTResolver implementation with support for custom nameservers
  • GroupAdapter wraps DNSRecordAccessor to provide group-aware IsActive() behavior
  • getActiveGroups() queries the special TXT record kuadrant-active-groups. to determine which groups are active
  • unpublishInactiveGroups() cleans up DNS records from previously active groups during failover

Controller Changes (internal/controller/dnsrecord_controller.go):

  • Integrated group adapter into reconciliation flow
  • Inactive group records exit early with 15s requeue time
  • Active group records unpublish inactive group DNS records after successful publish
  • Authoritative records are exempt from all of these changes

Test Coverage:

  • Added integration tests
  • Tests cover:
    • Multiple groups with active/inactive switching
    • Ungrouped records (always active)
    • Inactive group cleanup
    • Status condition updates
    • Multi-cluster coordination

How It Works

  1. Group Assignment

Each DNS operator instance is started with a group identifier:
--group=us-east
or
GROUP=us-east

Records managed by that operator inherit the group assignment in their status.

  1. Active Groups Declaration

The active groups list is stored as a TXT record in DNS:
kuadrant-active-groups.example.com TXT "groups=us-east&&us-west;version=1"

  1. Reconciliation Flow

Before publishing DNS records, each controller:

  1. Queries the active groups TXT record

  2. Compares its group against the active groups list

  3. If inactive: Updates status condition and requeues (15s)

  4. If active: Publishes its records AND cleans up records from inactive groups

  5. Ungrouped Records

Records without a group assignment (group="") are always active and published alongside whichever groups are currently active. They will never process unpublishing of records.

Example Scenario:

Setup:

  • Cluster A (group="us-east") has DNSRecord → 1.2.3.4
  • Cluster B (group="us-west") has DNSRecord → 5.6.7.8
  • Cluster C (ungrouped) has DNSRecord → 9.9.9.9

Active groups = ["us-east"]:
Published: 1.2.3.4, 9.9.9.9

Switch active groups to ["us-west"]:
Published: 5.6.7.8, 9.9.9.9
(Cluster B unpublishes stale 1.2.3.4)

Manual Verification Instructions

Prerequisites

  • DNS domain managed by a provider (AWS Route53, GCP, Azure, or CoreDNS)
  • Two clusters with DNS operator installed
  • DNS provider credentials configured

Option 1: Verification with AWS Route53

Setup:

  1. make local-setup with 2 clusters, and deploy true. Then edit the deployments to set the group runtime argument (e.g. us-east and us-west).

  2. Create test DNSRecords:

In cluster-1

  cat <<EOF | kubectl apply -f -
  apiVersion: kuadrant.io/v1alpha1
  kind: DNSRecord
  metadata:
    name: test-record
  spec:
    rootHost: test-groups.example.com
    endpoints:
    - dnsName: test-groups.example.com
      recordType: A
      targets:
      - 1.2.3.4
  EOF

In cluster-2

  cat <<EOF | kubectl apply -f -
  apiVersion: kuadrant.io/v1alpha1
  kind: DNSRecord
  metadata:
    name: test-record
  spec:
    rootHost: test-groups.example.com
    endpoints:
    - dnsName: test-groups.example.com
      recordType: A
      targets:
      - 5.6.7.8
  EOF
  1. Set initial active groups (us-east):
    Using kuadrant-dns-cli, set the active group to us-east

  2. Verify initial state:
    Query DNS (should return 1.2.3.4)
    dig test-groups.example.com +short

Check Route53 for published records
aws route53 list-resource-record-sets --hosted-zone-id
--query "ResourceRecordSets[?Name=='test-groups.example.com.']"

  1. Test failover to us-west:

Use dns cli to update the TXT record in Route53 to us-west

Wait 15-30 seconds for reconciliation, then verify:

Query DNS (should now return 5.6.7.8)
dig test-groups.example.com +short

Verification Checklist

  • DNSRecord in active group shows Active: True condition
  • DNSRecord in inactive group shows Active: False condition
  • DNS queries return only targets from active groups and ungrouped clusters
  • Switching active groups triggers cleanup of old records from inactive groups
  • Ungrouped records are always published

Related Issues: #620

@philbrookes philbrookes force-pushed the gh-620 branch 8 times, most recently from 5f9673d to 29c24eb Compare December 11, 2025 15:55
@philbrookes philbrookes force-pushed the gh-620 branch 2 times, most recently from eaf200f to 6795ee1 Compare December 15, 2025 15:09
log

rewrite name regex kuadrant-active-groups\.(.*)k.example\.com kuadrant-active-groups-coredns.pb.hcpapps.net
forward kuadrant-active-groups-coredns.pb.hcpapps.net /etc/resolv.conf
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will need to be updated to allow this custom host to be passed in at set up time, somehow, that's for another ticket: #670

@philbrookes philbrookes force-pushed the gh-620 branch 2 times, most recently from 098c135 to e9364ce Compare December 15, 2025 15:38
Copy link
Member

@Boomatang Boomatang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a number of questions. I know it is still a draft and subject to change. For the reason I didn't look at any test changes.

@philbrookes philbrookes force-pushed the gh-620 branch 3 times, most recently from b43bd9a to 9e54ed6 Compare December 18, 2025 11:38
Copy link
Member

@Boomatang Boomatang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still haven't looked at the test. Seen a few very small things. But I started to question the use of group field on the remote reconciles.

I am going to start setting this up locally and play around with it.

activeGroups := r.getActiveGroups(ctx, c, dnsRecord)

// only process unpublish when there are active groups and we are reconciling a record from an active group
if len(activeGroups) == 0 || !dnsRecord.IsActive() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this now can be simplified.

Suggested change
if len(activeGroups) == 0 || !dnsRecord.IsActive() {
if !dnsRecord.IsActive() {

@philbrookes philbrookes force-pushed the gh-620 branch 4 times, most recently from 5caf25d to a26babc Compare December 29, 2025 12:13
Signed-off-by: Phil Brookes <pbrookes@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
@philbrookes philbrookes marked this pull request as ready for review December 29, 2025 14:51
@philbrookes philbrookes changed the title [WIP] Gh 620 DNS Groups. Dec 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants