Error if a Site is created but old configuration is still present #2324

fgiorgetti · 2025-11-24T16:20:10Z

In case a Site is deleted and recreated quickly (eg: through an automation), the skupper-router ConfigMap owned by the previous site, may still be present.

The controller now fails, if it finds a router configuration that is not owned by the currently active site.

Fixes #2323.

vsomwanshi · 2025-11-24T23:15:54Z

@fgiorgetti
I don't know what is this exact issue is, however if you look at below output when i ran skupper site deletion command i did check all the objects and everything was completely clean.

$ 1188 [Sun 23 2:31PM] ip-192-168-1-6 :~/Desktop/isd-pl-f07e95fc0d9a skupper site delete --all -n test-site-new
Waiting for deletion to complete...
Site "test-site-new" is deleted

$ oc get site
No resources found in test-site-new namespace.

$ oc get pods 
No resources found in test-site-new namespace.

$ oc get secret
NAME                             TYPE                             DATA   AGE
all-icr-io                       kubernetes.io/dockerconfigjson   1      36m
builder-dockercfg-8b9t7          kubernetes.io/dockercfg          1      36m
default-dockercfg-6s5xw          kubernetes.io/dockercfg          1      36m
deployer-dockercfg-57xwx         kubernetes.io/dockercfg          1      36m
pipeline-dockercfg-8mbn6         kubernetes.io/dockercfg          1      36m
skupper-router-dockercfg-ctjvr   kubernetes.io/dockercfg          1      35m

$ oc get cm 
NAME                       DATA   AGE
config-service-cabundle    1      36m
config-trusted-cabundle    1      36m
kube-root-ca.crt           1      36m
openshift-service-ca.crt   1      36m

I am happy that you were able to reproduce this issue. Unfortunately i was unable to reproduce this in our lower environments, this is happening only in our production environment.

fgiorgetti · 2025-11-25T15:11:22Z

I am happy that you were able to reproduce this issue. Unfortunately i was unable to reproduce this in our lower environments, this is happening only in our production environment.

@vsomwanshi I was able to reproduce it, when I quickly delete/create a site, like in an automated way through a script.

What happened was that once a site is deleted and another site is created, the site that is created is being processed before the old resources, owned by the deleted site, have been removed, causing that error, which as you pointed out in the issue, can be recovered if you restart the skupper-controller pod.

Can you share some details on the procedure you guys are following in production to reproduce it? Is it possible that you guys have 2 sites created on the same namespace at the time you're deleting it? This could potentially be a similar trigger to that. Or eventually once you remove a site, is there any gitops operator applying a new site definition?

vsomwanshi · 2025-11-25T16:53:06Z

@fgiorgetti Please find comments inline;

Can you share some details on the procedure you guys are following in production to reproduce it?
--> Following below steps to reproduce the issue;

## Method 1:

- Delete skupper site from CLI using command : skupper site delete --all -n <namespace> 
- Wait for some time to get Site object as well other relative components deleted.
- Sync the Site yaml configuration ( Site Object ) from gitops which will eventually create all the relative objects as well. 

## Method 2:

- Delete skupper site object from gitops
- Wait for some time to get Site object as well other relative components deleted.
- Sync the Site yaml configuration ( Site Object ) from gitops eventually which will eventually create all the relative objects as well.

is it possible that you guys have 2 sites created on the same namespace at the time you're deleting it? This could potentially be a similar trigger to that. Or eventually once you remove a site.
--> We have 1:1 mapping, we are creating one skupper site only in one namespace. Other thing is anyway skupper controller will not allow you to create another site in the same namespace when Site object is already present in the namespace.

is there any gitops operator applying a new site definition?
--> Yes, we have entire setup of skupper through gitops only. Skupper controller, CRD's and sizing profile configmap's are deployed in one dedicated namespace. Site's are deployed in separate namespaces. In our production environment we have 55 skupper site created in one OpenShift cluster. Each site has 14 listeners and 5 connectors.

Not sure but somehow i am unable to reproduce this issue in our lower environments. Would it be happening in production because as mentioned in above comment we have 55 skupper sites created in one OpenShift cluster and each site has 14 listeners and 5 connectors. is it creating more events and due to which skupper-controller is unstable or unable to identify the site cleanups operations etc etc ?

vsomwanshi · 2025-11-26T01:09:03Z

@fgiorgetti or anyone of you can answer this; so this fix you are applying would be part of the latest release, right ? may be skupper 2.1.3 ? i could see lot of issues your team has fixed and i would need to rollout them in our environments near in future.

If i need to go with this release in future in our environments.

[1] During upgrade phase from 2.1.0 to 2.1.3 i just need to simply upgrade the skupper controller to 2.1.3, rest of the things would be completely taken care by controller itself (e.g upgrade skupper-router, kube-adaptor etc etc ) ?

[2] I believe no downtime required for this upgrade process but just for confirmation i am asking so i can accordingly take it to management.

[3] No need to touch the site's as well as skupper link recreation also not required.

Thank you.

fgiorgetti · 2025-12-03T22:25:39Z

@fgiorgetti or anyone of you can answer this; so this fix you are applying would be part of the latest release, right ? may be skupper 2.1.3 ? i could see lot of issues your team has fixed and i would need to rollout them in our environments near in future.

Yes, the idea is that this fix will be included as part of the 2.1.3. But I am still waiting on more feedback from reviewers.

[1] During upgrade phase from 2.1.0 to 2.1.3 i just need to simply upgrade the skupper controller to 2.1.3, rest of the things would be completely taken care by controller itself (e.g upgrade skupper-router, kube-adaptor etc etc ) ?

Correct.

[2] I believe no downtime required for this upgrade process but just for confirmation i am asking so i can accordingly take it to management.

There is a downtime, as once the controller is updated, it will also upadate all your sites, so the skupper-router deployment on each namespace will be updated as well, causing a restart.

[3] No need to touch the site's as well as skupper link recreation also not required.

Exactly. All existing sites and configuration are preserved.

In case a Site is deleted and recreated quickly (automated), the skupper-router ConfigMap owned by the previous site, may still be present (owned recource not yet deleted). The controller now fails, if it finds a router configuration that is not owned by the currently active site. Fixes skupperproject#2323.

…upperproject#2324) In case a Site is deleted and recreated quickly (automated), the skupper-router ConfigMap owned by the previous site, may still be present (owned recource not yet deleted). The controller now fails, if it finds a router configuration that is not owned by the currently active site. Fixes skupperproject#2323.

* Updated versions * upgrade go version to 1.24.9 Signed-off-by: Christian Kruse <christian@c-kruse.com> * Fix rendering of system site with LinkAccess specified * Change kube-adaptor Leader Election Loss Error Handling (#2296) * Change kube-adaptor Leader Election Loss Error Handling Updates the kube-adaptor so that when the skupper-site-leader Lease is lost the kube-adaptor will retry instead of exiting. Signed-off-by: Christian Kruse <christian@c-kruse.com> * remove harmful lease owner assignment Signed-off-by: Christian Kruse <christian@c-kruse.com> * fix kube flow controller go routine leak Signed-off-by: Christian Kruse <christian@c-kruse.com> --------- Signed-off-by: Christian Kruse <christian@c-kruse.com> * Prevents constant changes to HA connector (#2297) * Prevents constant changes to HA connector * Not mapped Ordinal causing indefinite sslProfile updates * Ensure HA connector is not considered different by setting cost to 1 * Mapping oldestValidOrdinal as well as it is needed when HA is enabled and disabled multiple times * Updated routeraccess test to match expected connector * Refactor internal/kube/watchers implementation (#2304) * Refactor internal/kube/watchers implementation Replaces the ~20 duplicated watchers.Watcher implementations with a single generic implementation. Also removes the unused watchers.Callback structures. Signed-off-by: Christian Kruse <christian@c-kruse.com> * spell Signed-off-by: Christian Kruse <christian@c-kruse.com> --------- Signed-off-by: Christian Kruse <christian@c-kruse.com> * Adds nginx.command template value to network observer chart (#2308) Adds nginx.command template value to the network observer chart for the network-observer deployment's nginx proxy container. Allows for alternate images and configurations to be used where the command needs to be specified. Signed-off-by: Christian Kruse <christian@c-kruse.com> * Fixes network observer traffic metrics bug (#2310) Previously traffic metrics (skupper_sent_bytes_total) was only updated when skupper_received_bytes_total was also incremented. This is especially visible in asymetric applications when data flows in one direction like iperf. Signed-off-by: Christian Kruse <christian@c-kruse.com> * Fix CI Image Build docker dependency (#2318) Removes superfluous docker install step from build-oci-images CI Job. Installed version was conflicting with containerd running in CircleCI VM image. Signed-off-by: Christian Kruse <christian@c-kruse.com> * Use a valid context for sending first flush (#2330) In case the wait for initial message times out, the initial flush message was sent using an expired context. * System controller properly accepts multicast message (#2332) Fixes #2331 * remove unhelpful logs (#2329) * Error if a Site is created but old configuration is still present (#2324) In case a Site is deleted and recreated quickly (automated), the skupper-router ConfigMap owned by the previous site, may still be present (owned recource not yet deleted). The controller now fails, if it finds a router configuration that is not owned by the currently active site. Fixes #2323. * Fixes the lifecycle of the PodWatcher used by AttachedConnectors (#2321) * Fixes the lifecycle of the PodWatcher used by AttachedConnectors Fixes #2320. * Add unit tests * Stopping watcher when binding is deleted * Unit tests to validate podwatcher stopped when attached connector or binding deleted --------- Signed-off-by: Christian Kruse <christian@c-kruse.com> Co-authored-by: Christian Kruse <christian@c-kruse.com> Co-authored-by: ajssmith <ansmith@redhat.com>

fgiorgetti requested review from c-kruse and nluaces as code owners November 24, 2025 16:20

nluaces approved these changes Dec 2, 2025

View reviewed changes

c-kruse approved these changes Dec 5, 2025

View reviewed changes

fgiorgetti force-pushed the fix-2323 branch from 6e97c67 to 8776fa4 Compare December 8, 2025 19:48

fgiorgetti merged commit 97ad5de into skupperproject:main Dec 9, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error if a Site is created but old configuration is still present #2324

Error if a Site is created but old configuration is still present #2324

Uh oh!

fgiorgetti commented Nov 24, 2025

Uh oh!

vsomwanshi commented Nov 24, 2025

Uh oh!

fgiorgetti commented Nov 25, 2025

Uh oh!

vsomwanshi commented Nov 25, 2025

Uh oh!

vsomwanshi commented Nov 26, 2025

Uh oh!

fgiorgetti commented Dec 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Error if a Site is created but old configuration is still present #2324

Error if a Site is created but old configuration is still present #2324

Uh oh!

Conversation

fgiorgetti commented Nov 24, 2025

Uh oh!

vsomwanshi commented Nov 24, 2025

Uh oh!

fgiorgetti commented Nov 25, 2025

Uh oh!

vsomwanshi commented Nov 25, 2025

Uh oh!

vsomwanshi commented Nov 26, 2025

Uh oh!

fgiorgetti commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fgiorgetti commented Dec 3, 2025 •

edited

Loading