-
Notifications
You must be signed in to change notification settings - Fork 845
Adds Cache Groups concepts to Cripts #12743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces Cache Groups functionality to Cripts, providing infrastructure for managing associations between cache entries using custom identifiers. This implementation follows an emerging RFC draft for cache group invalidation patterns in HTTP caching.
Key changes include:
- New Cache::Group class with disk persistence, rotating hash maps, and configurable aging policies
- Thread-safe operations using shared_mutex with automatic periodic syncing to disk
- Example implementation and comprehensive documentation for using Cache Groups in Cripts
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| src/cripts/CacheGroup.cc | Core implementation with hash-based storage, disk I/O, transaction logging, and Manager singleton for lifecycle management |
| include/cripts/CacheGroup.hpp | Public API defining the Group class with Insert/Lookup methods and nested Manager class |
| src/cripts/CMakeLists.txt | Adds CacheGroup.hpp to the list of public headers |
| include/cripts/Matcher.hpp | Includes algorithm header (duplicate include) |
| example/cripts/cache_groups.cc | Working example demonstrating Cache Groups for cache invalidation workflows |
| doc/developer-guide/cripts/cripts-misc.en.rst | Documentation explaining Cache Groups concept and usage patterns |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
bryancall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding Cache Groups to Cripts! This is a useful feature for implementing cache group invalidation patterns. I have a few observations:
Critical Bug: Iterator increment after erase()
In _cripts_cache_group_sync (line ~50-60), there's an iterator invalidation bug:
for (auto it = groups.begin(); it != groups.end() && processed < max_to_process; ++it) {
if (auto group = it->second.lock()) {
// ...
} else {
it = groups.erase(it); // Returns next iterator, then loop does ++it, skipping an element
}
}When erase() returns the next valid iterator and then the loop increments again, an element gets skipped. Consider:
for (auto it = groups.begin(); it != groups.end() && processed < max_to_process; ) {
if (auto group = it->second.lock()) {
// ...
++it;
} else {
it = groups.erase(it); // Don't increment here
}
}Error Handling
-
Missing file read error checking in
LoadFromDisk(): Thefile.read()calls (lines ~229-232 and ~241) don't check if reads succeeded. If the file is corrupted or truncated, uninitialized data gets used. -
clearLog() called unconditionally in
WriteToDisk(): IfsyncMap()fails, the transaction log is still cleared, which could lead to data loss. Consider only clearing the log after all syncs succeed. -
Inconsistent error reporting: Line 318 uses
std::cerrwhile the rest of the code usesTSWarning. Should be consistent. -
Missing filesystem error handling in
Initialize()(lines ~85-86):create_directoriesandpermissionscan throw or fail silently. TheclearLog()method (line 363) shows the correct pattern using error_code overloads.
Documentation & Style
- Spelling: "hodling" → "holding" (line 289), "assosication" → "association" (docs line 421)
- Duplicate
#include <algorithm>inMatcher.hpp - The magic number
63072000(2 years in seconds) appears multiple times - consider a named constant with documentation explaining the choice
Testing
This is a significant feature with complex persistence logic (disk serialization, transaction log replay, crash recovery). Would be good to have automated test coverage for these code paths.
Minor API Note
The Factory() returning void* requiring manual delete in do_delete_instance() is a bit awkward, but I understand this may be constrained by the Cripts plugin interface.
- cleans up the notion around cached URLs and headers, and cache keys. - adds APIs to set the lookup status as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 15 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| size_t _num_maps = 3; | ||
| size_t _max_entries = 1024; | ||
| std::chrono::seconds _max_age = DEFAULT_MAX_AGE; | ||
| std::atomic<size_t> _map_index = 0; |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _map_index member is declared as std::atomic but is accessed without atomic operations in multiple places. In the Insert method, it's read and written non-atomically (lines 126, 144), and in the Lookup method, it's read without atomic load (line 176). Since _mutex protects these operations, using std::atomic is unnecessary and misleading. Either use proper atomic operations or change it to a regular size_t since it's already protected by the mutex.
| std::atomic<size_t> _map_index = 0; | |
| size_t _map_index = 0; |
| if (ptr) { | ||
| auto date = cached.response.AsDate("Date"); | ||
|
|
||
| if (date > 0) { |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comparison operator should use '!=' instead of '>' for comparing with time_point zero to check validity. Time points don't have a natural ordering relative to "zero" in the same way integers do, and using '>' is semantically unclear. Consider using '!=' to check if the date is valid/not-zero.
| if (date > 0) { | |
| if (date != 0) { |
| if (date > 0) { | ||
| auto cache_groups = cached.response["Cache-Groups"]; | ||
| if (!cache_groups.empty()) { | ||
| borrow cg = *static_cast<std::shared_ptr<cripts::Cache::Group> *>(ptr); |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same comparison pattern appears here. Using '>' to compare a time_t value with 0 works but using '!=' would be more semantically clear to indicate checking for validity rather than ordering.
| if (date > 0) { | |
| auto cache_groups = cached.response["Cache-Groups"]; | |
| if (!cache_groups.empty()) { | |
| borrow cg = *static_cast<std::shared_ptr<cripts::Cache::Group> *>(ptr); | |
| if (date != 0) { | |
| auto cache_groups = cached.response["Cache-Groups"]; | |
| if (!cache_groups.empty()) { | |
| borrow cg = *static_cast<std::shared_ptr<std::shared_ptr<cripts::Cache::Group> *>(ptr); |
| if (it != slot.map->end() && it->second.length == key.size() && it->second.prefix == prefix) { | ||
| it->second.timestamp = now; | ||
| slot.last_write = now; | ||
| appendLog(it->second); | ||
|
|
||
| return; |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hash collision detection at line 129 only checks prefix and length but doesn't verify the actual key content. While the prefix check provides some collision resistance, two different keys could have the same hash, same length, and same 4-byte prefix, leading to incorrect cache group behavior. Consider either storing the full key or adding a more robust collision detection mechanism.
| if (it != map.end()) { | ||
| const Cache::Group::_Entry &entry = it->second; | ||
|
|
||
| if (entry.timestamp < age || entry.length != key.size() || entry.prefix != _make_prefix_int(key)) { | ||
| continue; | ||
| } | ||
|
|
||
| return true; |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same hash collision issue exists in the Lookup method. The check at line 189 verifies hash, timestamp, length, and prefix, but doesn't confirm the actual key matches. This could result in false positive lookups when hash collisions occur with keys that have matching length and prefix.
| return prefix; | ||
| } | ||
|
|
||
| // Stuff around the disk sync contination |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment contains a typo: "contination" should be "continuation".
| return new std::shared_ptr<Group>(std::move(group)); | ||
| } | ||
| } | ||
|
|
||
| if (!_instance()._base_dir.empty()) { | ||
| auto group = std::make_shared<Group>(name, _instance()._base_dir, max_entries, num_maps); | ||
|
|
||
| groups[name] = group; | ||
| return new std::shared_ptr<Group>(std::move(group)); |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory allocated by 'new std::shared_ptr' in the Factory method could leak if an exception is thrown before the caller takes ownership. Consider using std::make_unique or returning the shared_ptr by value instead of raw pointer to ensure exception safety.
| _action = nullptr; | ||
| } | ||
|
|
||
| _action = TSContScheduleEveryOnPool(_cont, _CONT_SYNC_INTERVAL * 1000, TS_THREAD_POOL_TASK); |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The continuation is scheduled with TSContScheduleEveryOnPool but there's no check whether scheduling succeeded. If _action is nullptr after this call, the continuation won't run and cache groups won't be synced to disk, potentially leading to data loss. Add error checking and logging if scheduling fails.
| namespace cripts::Cache | ||
| { | ||
|
|
||
| class Group |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing documentation for the Cache Groups feature. The code lacks explanation of key concepts like what cache groups are used for, how the rotation mechanism works, what the parameters mean (num_maps, max_entries, max_age), and how the disk persistence works. Consider adding comprehensive documentation including class-level and method-level comments explaining the design and usage patterns.
| if (std::rename(tmp_path.c_str(), slot.path.c_str()) != 0) { | ||
| TSWarning("cripts::Cache::Group: Failed to rename temp file `%s' to `%s'.", tmp_path.c_str(), slot.path.c_str()); | ||
| std::filesystem::remove(tmp_path); | ||
| } |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If std::rename fails, the temporary file is removed but the original file may be in an inconsistent state. The error handling should ensure that either the update succeeds completely or the original file remains intact. Consider implementing a more robust error recovery strategy, such as keeping a backup of the original file before attempting the rename.
This is a second version, since half of the original patch was merged in a separate PR.