Add node neighbor link updater controller#9
Add node neighbor link updater controller#9MitchLewis930 wants to merge 1 commit intopr_049_beforefrom
Conversation
This controller is responsible for monitoring nodes that request a neighbor link update. It provides a single place in the application where all node neighbor link are updates are performed as insertNeighbor are sprinkled in the code base. Additionally given the central location we can provide additional module health signals should neighbor link updates failed. Signed-off-by: Fernand Galiana <fernand.galiana@gmail.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| logfields.LogSubsys: "enqueue", | ||
| }).Warn("Skipping nodeNeighbor insert: No node given") | ||
| } | ||
| m.nodeNeighborQueue.push(&nodeQueueEntry{node: n, refresh: refresh}) |
There was a problem hiding this comment.
Missing return after nil check causes nil node queuing
High Severity
The Enqueue function logs "Skipping nodeNeighbor insert: No node given" when n == nil, but then continues execution and pushes the nil node to the queue anyway. A return statement is missing after the warning log, so the intended skip behavior doesn't actually happen. This causes nil nodes to accumulate in the queue and subsequently trigger errors when processed by StartNodeNeighborLinkUpdater, which breaks out of the processing loop and leaves remaining valid items unprocessed.
| if e == nil || e.node == nil { | ||
| errs = errors.Join(errs, fmt.Errorf("invalid node spec found in queue: %#v", e)) | ||
| break | ||
| } |
There was a problem hiding this comment.
Empty queue incorrectly treated as error condition
Medium Severity
In StartNodeNeighborLinkUpdater, after processing all items in the queue, pop() returns nil to indicate the queue is empty. However, the condition e == nil is grouped with e.node == nil and both trigger an error message "invalid node spec found in queue". An empty queue is a normal termination condition, not an error. This causes spurious error accumulation and incorrect health status reporting after every successful queue drain.
| hr.Degraded("Failed node neighbor link update", errs) | ||
| } else { | ||
| hr.OK("Node neighbor link update successful") | ||
| } |
There was a problem hiding this comment.
Accumulated errors incorrectly mark successful nodes as degraded
Medium Severity
In StartNodeNeighborLinkUpdater, the health status check uses accumulated errs rather than the current node's error. When processing multiple nodes, if an earlier node fails, errs retains that error. Subsequent successful nodes then get marked as "Degraded" because errs != nil is true even though their individual NodeNeighborRefresh call succeeded. The per-node health reporter (hr) receives misleading status based on unrelated prior failures.


PR_049
Note
Medium Risk
Touches core datapath node neighbor discovery/refresh flow and changes execution model (async goroutines → queued controller), which could affect neighbor programming timing and error handling in production.
Overview
Adds a new node-manager–owned queue and
StartNodeNeighborLinkUpdatercontroller that drains queued node neighbor updates on a fixed interval and reports per-node health, then wires it into agent startup.Refactors neighbor handling so datapath/node code enqueues neighbor insert/refresh requests (including ARP ping periodic refresh) instead of spawning ad-hoc goroutines; introduces
datapath.NodeNeighborEnqueuerand passes theNodeManagerinto the Linux datapath/node handler to support this.Updates the
NodeNeighborRefreshinterface to accept arefreshflag, adjusts error propagation/formatting in Linux neighbor insertion, and updates tests/fakes plus adds unit tests for the new queue.Written by Cursor Bugbot for commit 8d525fe. This will update automatically on new commits. Configure here.