-
Notifications
You must be signed in to change notification settings - Fork 1
Simpler stz manager #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Mesa DescriptionOverviewThis PR refactors the
TestingRan locally + against a few unikernels in staging. I would connect and disconnect a bunch of tabs to simulate flaky connections (this was a pretty consistent repro of the original issue) Example logs from staging: Description generated by Mesa. Update settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performed full review of 46e2fc7...e9293ae
Analysis
-
Despite the architectural improvements, the PR doesn't address potential race conditions in the new state-based design that could occur during concurrent session state transitions.
-
The simplified boolean flag
disabledScaleToZeromay lack granularity compared to the counter approach, potentially making it difficult to handle partial or in-progress scaling operations. -
The centralized
manage()method creates a potential single point of failure and may become a bottleneck as system complexity grows. -
The PR appears to focus on fixing symptoms rather than addressing the root cause of event deduplication failures, which could resurface in other components.
Tip
⚡ Quick Actions
This review was generated by Mesa.
Actions:
Slash Commands:
/review- Request a full code review/review latest- Review only changes since the last review/describe- Generate PR description. This will update the PR body or issue comment depending on your configuration/help- Get help with Mesa commands and configuration options
1 files reviewed | 0 comments | Review on Mesa | Edit Reviewer Settings
| connectedSessions := 0 | ||
| for _, s := range m.sessions.List() { | ||
| if s.State().IsConnected { | ||
| connectedSessions++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a world where a session is disconnected (IsConnected == false) but is trying to reconnect, and by not counting it here we might scale to zero, preventing it from connecting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question! There's a grace period after disconnects for reconnection: https://github.com/onkernel/neko/blob/e9293ae098832ae0c4b63d092a9bd370e3c0ab12/server/internal/session/session.go#L145-L213
Especially wrt to scale to zero that delay + this buffer seems reasonable. For new connections at our proxy layer we'd be "waking up" the VM externally regardless ^^
Pulls in kernel/neko#8 kernel/neko#9 <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Update chromium-headful Dockerfile to use neko base image 3.0.8-v1.3.0. > > - **Docker**: > - Bump base image in `images/chromium-headful/Dockerfile` from `ghcr.io/onkernel/neko/base:3.0.8-v1.1.0` to `3.0.8-v1.3.0`. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit e6afb32. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
Overview
We noticed issues where implementation would have < 0 pending. It seems like the naive assumptions that events would be deduped and delivered exactly once weren't really reliable. Instead swap to a model where we only disable/enable around edges rather than every event
Testing
Ran locally + against a few unikernels in staging. I would connect and disconnect a bunch of tabs to simulate flaky connections (this was a pretty consistent repro of the original issue)
Example logs from staging: