[refactor] Split Services into watcher/handler under felix#776
Open
[refactor] Split Services into watcher/handler under felix#776
Conversation
This patch changes the way we persist the data on disk when running Calico/VPP. Instead of using struc and binary format we transition to json files. Size should not be an issue as number of pods per node are typically low (~100). This will make troubleshooting easier and errors clearer when parsing fails. We thus remove the /bin/debug troubleshooting utility as the data format is not human readable. Doing this, we address an issue where PBL indexes were reused upon dataplane restart, as they were stored in a list. We now will use a map to retain the containerIP mapping. We also split the configuration from runtime spec in LocalPodSpec and add a step to clear it when corresponding VRFs are not found in VPP. Finally we address an issue where uRPF was not properly set up for ipv6. Signed-off-by: Nathan Skrzypczak <nathan.skrzypczak@gmail.com>
This patch splits the felix server in two pieces: - a felix watcher placed under `agent/watchers/felix` - a felix server placed under `agent/felix` The former will have only the responsibility of watching and submitting events into a single event queue. The latter will receive the event in a single goroutine and proceed to program VPP as a single thred. The intent is to move away from a model with multiple servers replicating state and communicating over a pubsub. This being prone to race conditions, deadlocks, and not providing many benefits as scale & asynchronicity will not be a constraint on nodes with relatively small number of pods (~100) as is k8s default. Signed-off-by: Nathan Skrzypczak <nathan.skrzypczak@gmail.com>
This patch splits the CNI watcher and handlers in two pieces. The handling will be done in the main 'felix' goroutine, while the watching / grpc server will live under watchers/ and not store or access agent state. The intent is to move away from a model with multiple servers replicating state and communicating over a pubsub. This being prone to race conditions, deadlocks, and not providing many benefits as scale & asynchronicity will not be a constraint on nodes with relatively small number of pods (~100) as is k8s default. Signed-off-by: Nathan Skrzypczak <nathan.skrzypczak@gmail.com>
This patch moves the Connectivity handlers in the main felix loop to allow lockless access to the cache. The intent is to move away from a model with multiple servers replicating state and communicating over a pubsub. This being prone to race conditions, deadlocks, and not providing many benefits as scale & asynchronicity will not be a constraint on nodes with relatively small number of pods (~100) as is k8s default. Signed-off-by: Nathan Skrzypczak <nathan.skrzypczak@gmail.com>
This patch splits services in two components, - a watcher that handles the informer fetching services and endpoints from the k8s API. - a handler that takes care of programming VPP with the NAT rules, within the context of the felix server's single goroutine. The intent is to move away from a model with multiple servers replicating state and communicating over a pubsub. This being prone to race conditions, deadlocks, and not providing many benefits as scale & asynchronicity will not be a constraint on nodes with relatively small number of pods (~100) as is k8s default. Signed-off-by: Nathan Skrzypczak <nathan.skrzypczak@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This patch splits services in two components,
endpoints from the k8s API.
rules, within the context of the felix server's single goroutine.
The intent is to move away from a model with multiple servers
replicating state and communicating over a pubsub. This being
prone to race conditions, deadlocks, and not providing many
benefits as scale & asynchronicity will not be a constraint
on nodes with relatively small number of pods (~100) as is k8s
default.