-
Notifications
You must be signed in to change notification settings - Fork 105
feat(eap): Add trimming processor for EAP items #5489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This adds a new trimming processor specifically for EAP items. The reason we can't just reuse the existing processor is the size calculation logic. The existing processor calculates the sizes of values using a special serializer. For EAP items, specfically their attributes, we want to use the attribute size logic defined in `eap::size`. Attributes are processed in order from smallest to greatest total (key + value) size. The effect of this is that small attributes are more likely to be preserved. Attribute keys are taken into account for the size, but never trimmed—if a key is too long the attribute is just removed.
Dav1dde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a good base to add more features like special keys, adding in support for conventions later.
If you can think of more examples and edge cases to write tests for, I think it would be a good time to do that now, these snapshots will come in very handy for every future change.
| // Heuristic to avoid trimming a value like `[1, 1, 1, 1, ...]` into `[null, null, null, | ||
| // null, ...]`, making it take up more space. | ||
| self.remaining_depth(state) == Some(1) && !value.is_empty() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an example/test for this? We might need the null cases to individually attach metadata to each of the nodes, deleting the parent will delete the child metadata (which maybe is okay).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied that for parity with the other trimming processor, but that doesn't seem to have a test for it either :/
What I'd like to test is "dynamic" max chars/max bytes (similar to dynamic PII, i.e. being able to compute it with a function). Implementing that feature isn't difficult; now that I've done it for PII I think it should be very similar. I just can't think of how to mock it for testing purposes, since this processor's logic is directly tied to |
|
@loewenheim sounds good, I like if we split the work and tackle the modifications afterwards and just merge this as is. I think for these future changes we'll benefit from a more extensive testsuite, so maybe adding more (snapshot) tests now will pay off soon, but obviously that's up to you.
|
|
As written, the processor currently "overaccepts" numerical/boolean attributes. This means that if e.g. a
@Dav1dde does that make sense? |
jjbayer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to revert my opinion on whether this should be a separate processor:
- The two trimming processors now share a lot of copy-pasted code.
- The difference seems to be mainly in
process_attributes, which the original trimming processor does not override.
Alternatively, we could deduplicate code by embedding a trimming::TrimmingProcessor within the new one, and delegating to it.
I don't think that would work because of the size calculation logic. I agree that the duplication is super unfortunate, but I don't see how we can avoid it without making it possible to generalize size calculations. |
Alright, not a blocker so feel free to merge. |
This adds a new trimming processor specifically for EAP items. The reason we can't just reuse the existing processor is the size calculation logic. The existing processor calculates the sizes of values using a special serializer. For EAP items, specfically their attributes, we want to use the attribute size logic defined in
eap::size.Attributes are processed in order from smallest to greatest total (key + value) size. The effect of this is that small attributes are more likely to be preserved. Attribute keys are taken into account for the size, but never trimmed—if a key is too long the attribute is just removed.
ref: #5362. ref: RELAY-177.