-
Notifications
You must be signed in to change notification settings - Fork 45
Merge develop to soon #946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: wmypku <wmypku@outlook.com>
This reverts commit d2341f5.
QoL PR: you can now generate `compile_commands.json` file with: ``` zig build -Dgenerate-commands ```
I was seeing 502 Bad Gateway in some of responses to polling requests in the web interface of a Gall agent. Caddy logs showed an error: ``` transport connection broken: malformed HTTP version "nullHTTP/1.1" ``` Notably, I was sending "null" body with 204 No Content response, which must not have a body. Changing response to "true" changed the HTTP version to "trueHTTP/1.1" After playing around in http.c I came to a conclusion that h2o is probably not waiting for the rest of the response, since it assumes that it won't have a body according to [the HTTP spec](https://datatracker.ietf.org/doc/html/rfc9110#section-15.3.5-5). So what probably happens is, Eyre sends the rest of 204 response body, while h2o already moved on. Body of the response is left in some buffer which gets concatenated with the next response, corrupting it. This PR adds another HTTP request state, `u3_rsat_sown`, which guarantees that the response was already sent. Following `%continue` and `%cancel` effects are noops. Additionaly, response body is set to 0 for all informational responses (1xx), 204, 205 (Reset Content) and 304 (Not Modified).
After #932 I decided to take a closer look on memory leak reports on exit from vere, which were previously shrugged off as false positives on objects that have static lifetimes. I found that `_lick_io_exit` forgets to free the path to IPC directory before freeing the struct itself. I also decided to explicitly annotate objects that supposedly have static lifetimes, so that LSAN printouts on exit have less noise. To review: do the objects annotated with `__lsan_ignore_object` actually have static lifetimes?
Discovered by ~sarpen-laplux, @joemfb might want to add some context here for posterity.
remove redundant words
This PR draft documents the attempt to make road promotion safer when it comes to async signal handling and `u3m_bail` calls. - [ ] `jets.c` - [ ] `nock.c` ### Problem In Vere we have nonlinear control flow in the form of exception raising with `u3m_bail` and catching it in various wrappers like `u3m_soft_run` and signal handling via `u3m_soft_top`/`u3m_signal`. So there are three ways we can "return" from a wrapped computation: 1. We can actually return from it, copying the product and some persistent state out of the child road, returning the product to the caller and integrating the produced persistent state into the state of the parent road; 2. We can jump out via `u3m_bail` call. The exception will either be turned into a noun, allowing Nock virtualization via metacircular jets, or the exception will be raised again to bubble it up to the top, eventually crashing the event transaction and printing a stacktrace. We still promote and integrate the accumulated persistent state; 3. We can jump out all the way to the top wrapper via a signal. In this case we repeatedly perform road promotion and accumulate the stack traces from all roads. Doing so, **we also promote and integrate the accumulated persistent state**. I noticed the latter when experimenting with Ford Lightning build system: hitting ^C mid build would interrupt it, and if the build was retried it would take less time than if it was done the first time without interruption. This is because some persistent caches were already accumulated and promoted to the home road on SIGINT handling. This rang an alarm bell in my head: for this to properly work we would need to make sure that all functions that modify road state (`u3h_*`, functions that modify bytecode programs) are async-signal safe, otherwise, we might be promoting state that does not uphold its invariants. After a discussion at core blitz on ~2026.1.9, @joemfb raised a concern that the issue has a broader scope, as signals are not the only piece of non-linear control flow in the codebase. Functions that modify road state must also be exception-handling safe: at each `u3m_bail` callsite the road state needs to conform to its invariants too. It is worth noting that not all invariants need to be upheld at either bail or signal raise: for HAMTs it is sufficient to be walkable and have valid values. An example of HAMT having an invalid value: if a SIGINT comes between lines 1502-1506 here: https://github.com/urbit/vere/blob/33671ea187baff627a05996b66f98dc579239fca/pkg/noun/jets.c#L1502-L1506 , the site struct will have a bytecode program pointer that does not correspond to the battery stored in `bat` field. _(This particular example does not allow to produce a bug because this fields are erased anyway when [we call a program from a senior road](https://github.com/urbit/vere/blob/33671ea187baff627a05996b66f98dc579239fca/pkg/noun/nock.c#L1722-L1723))_ ### Solution I decided to minimize the surface area of code where the async-signal safety is required by only promoting the road state if: - the wrapped function returned sucesfully, - OR if the error is deterministic With that, the only thing we are doing when async signal or nondeterministic error are raised is copying out nouns. #### Nouns are (almost) async-signal safe The only operation that is not async-signal safe is `u3i_edit`, if it is performed on a mutable noun (refcount of 1). The only place where it is used is in the Nock bytecode interpreter, and in case of any crash the product from the road stack is not being promoted anyway, so the stacktraces and error reports are safe. #### HAMTs are fine The only calls to `u3m_bail` in `hashtable.c` were in `u3_assert` (bail with c3__oops mote, which is not recoverable) and `u3h/t` macros to disassemble key-value pairs. These are always cells by construction, so no change is necessary, but I replaced them with asserting versions for documentation purposes, just to be sure. #### Nock interpreter: to verify Cursory look at `nock.c`/`jets.c` didn't raise any alarms, but I might be missing something. In `jets.c` `u3h/t` macros are used everywhere to dissassemble nouns from jet state and to disassemble cores for Nock 9 calls. The former should be infallible, and the latter could legitimately crash, so more attention is necessary there.
Forces `--no-demand`, makes the docked binary name `<ship>/.run.exe` instead of `<ship>/.run` and upload the binaries to bootstrap.urbit.org from CI on windows.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.