-
Notifications
You must be signed in to change notification settings - Fork 86
Error if a Site is created but old configuration is still present #2324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@fgiorgetti I am happy that you were able to reproduce this issue. Unfortunately i was unable to reproduce this in our lower environments, this is happening only in our production environment. |
@vsomwanshi I was able to reproduce it, when I quickly delete/create a site, like in an automated way through a script. What happened was that once a site is deleted and another site is created, the site that is created is being processed before the old resources, owned by the deleted site, have been removed, causing that error, which as you pointed out in the issue, can be recovered if you restart the skupper-controller pod. Can you share some details on the procedure you guys are following in production to reproduce it? Is it possible that you guys have 2 sites created on the same namespace at the time you're deleting it? This could potentially be a similar trigger to that. Or eventually once you remove a site, is there any gitops operator applying a new site definition? |
|
@fgiorgetti Please find comments inline; Can you share some details on the procedure you guys are following in production to reproduce it? is it possible that you guys have 2 sites created on the same namespace at the time you're deleting it? This could potentially be a similar trigger to that. Or eventually once you remove a site. is there any gitops operator applying a new site definition? Not sure but somehow i am unable to reproduce this issue in our lower environments. Would it be happening in production because as mentioned in above comment we have 55 skupper sites created in one OpenShift cluster and each site has 14 listeners and 5 connectors. is it creating more events and due to which |
|
@fgiorgetti or anyone of you can answer this; so this fix you are applying would be part of the latest release, right ? may be skupper If i need to go with this release in future in our environments. [1] During upgrade phase from [2] I believe no downtime required for this upgrade process but just for confirmation i am asking so i can accordingly take it to management. [3] No need to touch the site's as well as skupper link recreation also not required. Thank you. |
Yes, the idea is that this fix will be included as part of the 2.1.3. But I am still waiting on more feedback from reviewers.
Correct.
There is a downtime, as once the controller is updated, it will also upadate all your sites, so the skupper-router deployment on each namespace will be updated as well, causing a restart.
Exactly. All existing sites and configuration are preserved. |
In case a Site is deleted and recreated quickly (automated), the skupper-router ConfigMap owned by the previous site, may still be present (owned recource not yet deleted). The controller now fails, if it finds a router configuration that is not owned by the currently active site. Fixes skupperproject#2323.
…upperproject#2324) In case a Site is deleted and recreated quickly (automated), the skupper-router ConfigMap owned by the previous site, may still be present (owned recource not yet deleted). The controller now fails, if it finds a router configuration that is not owned by the currently active site. Fixes skupperproject#2323.
* Updated versions * upgrade go version to 1.24.9 Signed-off-by: Christian Kruse <christian@c-kruse.com> * Fix rendering of system site with LinkAccess specified * Change kube-adaptor Leader Election Loss Error Handling (#2296) * Change kube-adaptor Leader Election Loss Error Handling Updates the kube-adaptor so that when the skupper-site-leader Lease is lost the kube-adaptor will retry instead of exiting. Signed-off-by: Christian Kruse <christian@c-kruse.com> * remove harmful lease owner assignment Signed-off-by: Christian Kruse <christian@c-kruse.com> * fix kube flow controller go routine leak Signed-off-by: Christian Kruse <christian@c-kruse.com> --------- Signed-off-by: Christian Kruse <christian@c-kruse.com> * Prevents constant changes to HA connector (#2297) * Prevents constant changes to HA connector * Not mapped Ordinal causing indefinite sslProfile updates * Ensure HA connector is not considered different by setting cost to 1 * Mapping oldestValidOrdinal as well as it is needed when HA is enabled and disabled multiple times * Updated routeraccess test to match expected connector * Refactor internal/kube/watchers implementation (#2304) * Refactor internal/kube/watchers implementation Replaces the ~20 duplicated watchers.Watcher implementations with a single generic implementation. Also removes the unused watchers.Callback structures. Signed-off-by: Christian Kruse <christian@c-kruse.com> * spell Signed-off-by: Christian Kruse <christian@c-kruse.com> --------- Signed-off-by: Christian Kruse <christian@c-kruse.com> * Adds nginx.command template value to network observer chart (#2308) Adds nginx.command template value to the network observer chart for the network-observer deployment's nginx proxy container. Allows for alternate images and configurations to be used where the command needs to be specified. Signed-off-by: Christian Kruse <christian@c-kruse.com> * Fixes network observer traffic metrics bug (#2310) Previously traffic metrics (skupper_sent_bytes_total) was only updated when skupper_received_bytes_total was also incremented. This is especially visible in asymetric applications when data flows in one direction like iperf. Signed-off-by: Christian Kruse <christian@c-kruse.com> * Fix CI Image Build docker dependency (#2318) Removes superfluous docker install step from build-oci-images CI Job. Installed version was conflicting with containerd running in CircleCI VM image. Signed-off-by: Christian Kruse <christian@c-kruse.com> * Use a valid context for sending first flush (#2330) In case the wait for initial message times out, the initial flush message was sent using an expired context. * System controller properly accepts multicast message (#2332) Fixes #2331 * remove unhelpful logs (#2329) * Error if a Site is created but old configuration is still present (#2324) In case a Site is deleted and recreated quickly (automated), the skupper-router ConfigMap owned by the previous site, may still be present (owned recource not yet deleted). The controller now fails, if it finds a router configuration that is not owned by the currently active site. Fixes #2323. * Fixes the lifecycle of the PodWatcher used by AttachedConnectors (#2321) * Fixes the lifecycle of the PodWatcher used by AttachedConnectors Fixes #2320. * Add unit tests * Stopping watcher when binding is deleted * Unit tests to validate podwatcher stopped when attached connector or binding deleted --------- Signed-off-by: Christian Kruse <christian@c-kruse.com> Co-authored-by: Christian Kruse <christian@c-kruse.com> Co-authored-by: ajssmith <ansmith@redhat.com>
In case a Site is deleted and recreated quickly (eg: through an automation), the skupper-router ConfigMap owned by the previous site, may still be present.
The controller now fails, if it finds a router configuration that is not owned by the currently active site.
Fixes #2323.