Conversation
This comment was marked as outdated.
This comment was marked as outdated.
20d8d45 to
dd3873d
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This will panic for bad font data, but good for measurements in HarfRust.
This change is not necessary for performance gains in my measurement.
Invalid memory access on bad font data, but useful for measuring HarfRust performance boost.
|
So this is doing two things: it is skipping any validation of the input data, and it is subsequently doing unsafe reads. I don't see that this is actually doing any access checks in the getters? I do see the benefit of trying to do bounds checking on some full table graph instead of always needing to do it on each read, I can try to play around with what that might look like.. |
Right. Here's some unhooked vibe-coded |
|
This is what I propose: an alternate reading path, full-on HarfBuzz style: All the simple codegen'ed tables & structs get a OpenType has the following wording:
https://learn.microsoft.com/en-us/typography/opentype/spec/otff To implement this, I suggest that we adopt the HarfBuzz null-object model, whereas if a null offset is dereferenced, a pointer to a shared singleton null object of that type is returned instead, removing codepath divergence in the callers for when an offset is null vs points to an empty object. |
|
So the problem I'm seeing with the sanitize approach is that we have a TOCTOU problem unless you retain a sanitized reference to the whole checked subtree. For HR, this means that we have to run the sanitize pass every time we construct a This might not matter for Chrome because I imagine we'll keep the In either case, sanitize and read cannot really be separate-- we need to sanitize on read to guarantee safety at the API level. |
With mmaped files, you can't get around TOCTOU since the data can change under you anyway. Just saying. Sanitizing per Shaper is fine I think. As long as it's not per shape() call, we should be good. |
Point taken. I believe there are still ongoing arguments about the combination of mmap and Rust. Same for anything else that lets you poke at process memory. |
This is the result of a few days vibe-coding with codex. This moves access checks from table construction time, to each individual fields's getter method.
If there's out of bounds access, this code crashes. We can see if we can codegen
sanitizemethods ala HB to address that and reject at font table level bad data.Accompanying HR PR: harfbuzz/harfrust#314
Experiment details:
https://docs.google.com/document/d/1LjYFjZj8Kw8zyhqsfZg0_VgzHf2zhYUsksPCkZL7GJI/edit