-
Notifications
You must be signed in to change notification settings - Fork 0
Description
⚠️ Issue: Lifestream Stagnation (Zombie Data Persistence)
The system currently assumes that as long as the PumpNode is alive, the data coming from the Jetson is fresh. However, if the jtop daemon hangs or the hardware communication bus (I2C/SMBus) stalls, jtop.ok() may return False, or the data values may simply stop updating while the loop continues to run.
🎯 Location:
robot/vtc/pump.py -> LO flow timer (Heartbeat check)
robot/vtc/pump.py -> state machine transitions
🦠 Symptoms:
- The Regulator continues to pulse the heart based on old "frozen" temperatures.
- UI vitals appear "flatlined" but at a normal value (e.g., exactly 42.0°C for 10 minutes).
- No warning is issued when the underlying hardware telemetry source is disconnected.
🩺 Diagnosis:
A Watchdog Pattern is missing. A "Lifestream Guard" is required to monitor the health of the connection to the Jetson hardware. If the source (jtop) becomes unresponsive or reports an unhealthy status, the Pump must stop pretending everything is fine and signal a system-wide warning.
💡 Proposal:
The "Lifestream Guard" Watchdog
Implement a health check within the LO flow (Low Frequency) to validate the hardware connection and manage state transitions.
- Consecutive Failure Counter: Track how many times
jtop.ok()returnsFalse. - Grace Period: Allow for 1 or 2 missed ticks (to account for momentary CPU spikes), but trigger an alert after N failures.
- State Demotion: If the threshold is hit, demote
self.statefromRUNNINGtoDEGRADED. - Data Invalidation: When in a
DEGRADEDstate, the Pump should inject None into the deques to ensure the Regulator and Display know the data is no longer trustworthy. - Connect with issue ❤️ (VCS/VTC) None vs Value Consistency #22; can be implemented together.
Metadata
Metadata
Assignees
Labels
Projects
Status