Skip to content

Conversation

@cvaroqui
Copy link
Member

No description provided.

/var/lib/opensvc/cache/addrinfo/<name> files contain the
resolved ipaddr. When the name resolution does not work, the
cache files content is trusted for 16m. After that duration,
the ip resource status is considered undef and we now longer
accept to stop or start the resource as we don't know the addr.
With micro-containers we used "nsenter -e" but this can hang
with certain /proc/<pid>/environ contents.

Parse the container process environ ourselves and pass the env
to the exec.Command. Don't use -e anymore.
So the secondary instance is ready to refetch when the up
instance status is next refreshed.

This patch fixes a case where the file was never refetched
when the admin removed the file from a secondary node in the
hope to demonstrate how opensvc handles resync automatically.
e.g.:

	# om test/svc/hdoc enter --rid container#hdoc
	panic: runtime error: invalid memory address or nil pointer dereference
	[signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0x1b01d04]

	goroutine 15 [running]:
	github.com/opensvc/om3/v3/drivers/rescontainerocibase.(*Executor).Enter(0xc0002caaa0, {0x37e7cb8?, 0xc0004eeff0?})
		/root/dev/om3/drivers/rescontainerocibase/executor.go:73 +0xc4
	github.com/opensvc/om3/v3/drivers/rescontainerocibase.(*BT).Enter(0x446df20?, {0x37e7cb8?, 0xc0004eeff0?})
		/root/dev/om3/drivers/rescontainerocibase/main.go:318 +0x32
	github.com/opensvc/om3/v3/core/object.(*actor).Enter(0xc0004f0fc0, {0x37e7cb8, 0xc0004eeff0}, {0x7fffe2325645, 0xe})
		/root/dev/om3/core/object/actor_enter.go:31 +0x214
	github.com/opensvc/om3/v3/core/omcmd.(*CmdObjectEnter).Run.func1({0x37e7cb8, 0xc0004eeff0}, {{0x7fffe2325634, 0x4}, {0x7fffe232562b, 0x4}, {0x7fffe2325630, 0x3}})
		/root/dev/om3/core/omcmd/object_enter.go:33 +0xa2
	github.com/opensvc/om3/v3/core/objectaction.T.DoLocal.func1({0x37e7cb8?, 0xc0004eeff0?}, {0x0?, 0x0?}, {{0x7fffe2325634, 0x4}, {0x7fffe232562b, 0x4}, {0x7fffe2325630, 0x3}})
		/root/dev/om3/core/objectaction/object.go:383 +0x68
	github.com/opensvc/om3/v3/core/objectaction.T.instanceDo.func1({0xc00022441a?, 0x0?}, {{0x7fffe2325634, 0x4}, {0x7fffe232562b, 0x4}, {0x7fffe2325630, 0x3}})
		/root/dev/om3/core/objectaction/object.go:986 +0x105
	created by github.com/opensvc/om3/v3/core/objectaction.T.instanceDo in goroutine 1
		/root/dev/om3/core/objectaction/object.go:972 +0x196

Test if inspect data is nil before use.
* Return the real up/down status when the name ip cache is valid.
  Just add a resource status warn log.

* Return undef status when the name ip cache is expired.
  At this point, stop and start instance actions won't work.

* Bump the expiration time from 16m to 19m so the instance apps
  have at 9-19m to stop instead of 6-16m. The 6m worst case
  has been seen in real life.
* Add a imon resource files fetched cache init
* Add resource files changes and deletes detection on secondary
  instances, and trigger a immediate fetch when that happens.

Example:

	Jan 14 12:06:14 dev2n2 om[126251]: daemon: imon: kvm/svc/vm2: container#1: file /etc/libvirt/qemu/vm2.xml discovered (csum=c8013f2da973061d8d86e3b60d35b753, mtime=2026-01-10 22:13:50.170941242 +0100 CET)
	Jan 14 12:06:30 dev2n2 om[126251]: daemon: imon: kvm/svc/vm2: container#1: file /etc/libvirt/qemu/vm2.xml disappeared
	Jan 14 12:06:30 dev2n2 om[126251]: daemon: imon: kvm/svc/vm2: container#1: file /etc/libvirt/qemu/vm2.xml fetch from dev2n1
@cgalibern cgalibern merged commit 8296f02 into opensvc:main Jan 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants