Conversation
This commit refactores the probe initialization to use a Hive Job instead of a plain lifecycle start hook. This way we can also get rid of the raw Go routine to execute the check if every probe successfully executed at least once before exposing the status. Note: the kvstore "shutdown check" is still part of its own lifecycle stop hook. Probably better to eventually move this to the kvstore module. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
|
|
||
| <-ctx.Done() | ||
| return nil | ||
| })) |
There was a problem hiding this comment.
Probes stopped prematurely on initial timeout error
High Severity
When WaitForFirstRun times out, returning an error causes the defer collector.statusCollector.Close() to execute immediately, stopping all running probes. The previous implementation kept probes running even when the initial check failed, only closing them during OnStop. Now, if any probe takes longer than StatusCollectorProbeCheckTimeout (default 5 minutes) for its first run, the entire status collector becomes non-functional because all probes are terminated.
|
|
||
| <-ctx.Done() | ||
| return nil | ||
| })) |
There was a problem hiding this comment.
Health parameter received but never used for reporting
Medium Severity
The health cell.Health parameter passed to the job.OneShot function is never used. The old code explicitly called probeCheckHealth.Degraded() on failure and probeCheckHealth.OK() on success to report probe initialization status. The new code receives the health parameter but doesn't call any health methods, so the probe check health is never reported to the hive health system. The comment "Report health whether all probes have been executed at least once" remains, but the actual health reporting was omitted.


PR_044
Note
Medium Risk
Changes the startup/shutdown and readiness/health-reporting mechanics for status probes, which could affect agent health signals or probe lifetime under cancellation/timeouts.
Overview
Status probe startup is now driven by Hive jobs instead of lifecycle hooks.
newStatusCollectorregisters ajob.OneShot("probes")that starts probes, waits (with timeout) for the first successful run, marksallProbesInitialized, and closes probes when the job exits.The previous
startStatusCollectorgoroutine-based startup (and itscell.Healthscope reporting) is removed, along with explicit probe shutdown from theOnStophook;OnStopnow only logs a KVStore-not-OK hint.Written by Cursor Bugbot for commit 829f56e. This will update automatically on new commits. Configure here.