[refactor] Split Services into watcher/handler under felix by sknat · Pull Request #776 · projectcalico/vpp-dataplane

sknat · 2025-09-02T15:04:25Z

This patch splits services in two components,

a watcher that handles the informer fetching services and
endpoints from the k8s API.
a handler that takes care of programming VPP with the NAT
rules, within the context of the felix server's single goroutine.

The intent is to move away from a model with multiple servers
replicating state and communicating over a pubsub. This being
prone to race conditions, deadlocks, and not providing many
benefits as scale & asynchronicity will not be a constraint
on nodes with relatively small number of pods (~100) as is k8s
default.

This patch changes the way we persist the data on disk when running Calico/VPP. Instead of using struc and binary format we transition to json files. Size should not be an issue as number of pods per node are typically low (~100). This will make troubleshooting easier and errors clearer when parsing fails. We thus remove the /bin/debug troubleshooting utility as the data format is not human readable. Doing this, we address an issue where PBL indexes were reused upon dataplane restart, as they were stored in a list. We now will use a map to retain the containerIP mapping. We also split the configuration from runtime spec in LocalPodSpec and add a step to clear it when corresponding VRFs are not found in VPP. Finally we address an issue where uRPF was not properly set up for ipv6. Signed-off-by: Nathan Skrzypczak <nathan.skrzypczak@gmail.com>

This patch splits the felix server in two pieces: - a felix watcher placed under `agent/watchers/felix` - a felix server placed under `agent/felix` The former will have only the responsibility of watching and submitting events into a single event queue. The latter will receive the event in a single goroutine and proceed to program VPP as a single thred. The intent is to move away from a model with multiple servers replicating state and communicating over a pubsub. This being prone to race conditions, deadlocks, and not providing many benefits as scale & asynchronicity will not be a constraint on nodes with relatively small number of pods (~100) as is k8s default. Signed-off-by: Nathan Skrzypczak <nathan.skrzypczak@gmail.com>

This patch splits the CNI watcher and handlers in two pieces. The handling will be done in the main 'felix' goroutine, while the watching / grpc server will live under watchers/ and not store or access agent state. The intent is to move away from a model with multiple servers replicating state and communicating over a pubsub. This being prone to race conditions, deadlocks, and not providing many benefits as scale & asynchronicity will not be a constraint on nodes with relatively small number of pods (~100) as is k8s default. Signed-off-by: Nathan Skrzypczak <nathan.skrzypczak@gmail.com>

This patch moves the Connectivity handlers in the main felix loop to allow lockless access to the cache. The intent is to move away from a model with multiple servers replicating state and communicating over a pubsub. This being prone to race conditions, deadlocks, and not providing many benefits as scale & asynchronicity will not be a constraint on nodes with relatively small number of pods (~100) as is k8s default. Signed-off-by: Nathan Skrzypczak <nathan.skrzypczak@gmail.com>

This patch splits services in two components, - a watcher that handles the informer fetching services and endpoints from the k8s API. - a handler that takes care of programming VPP with the NAT rules, within the context of the felix server's single goroutine. The intent is to move away from a model with multiple servers replicating state and communicating over a pubsub. This being prone to race conditions, deadlocks, and not providing many benefits as scale & asynchronicity will not be a constraint on nodes with relatively small number of pods (~100) as is k8s default. Signed-off-by: Nathan Skrzypczak <nathan.skrzypczak@gmail.com>

sknat added 5 commits August 26, 2025 15:16

sknat requested review from aritrbas, hedibouattour and onong September 2, 2025 15:04

sknat self-assigned this Sep 2, 2025

sknat added this to the agent refactoring single thread milestone Nov 17, 2025

sknat changed the title ~~Split Services into watcher/handler under felix~~ [refactor] Split Services into watcher/handler under felix Jan 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[refactor] Split Services into watcher/handler under felix#776

[refactor] Split Services into watcher/handler under felix#776
sknat wants to merge 5 commits intomasterfrom
nsk-split-svc

sknat commented Sep 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sknat commented Sep 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant