-
Notifications
You must be signed in to change notification settings - Fork 73
DDSketch #466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
DDSketch #466
Conversation
# Conflicts: # ddsketch/include/sparse_store.hpp
leerho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this contribution! We certainly are interested in adding new sketches into our library. However, we have several requests for submissions of entirely new sketches into this repository:
C++ version compatibility:
- The datasketches-cpp repository is a base repo in that there are several repositories that are also part of the Apache/DataSketches project that depend on this base repo. These include the repositories datasketches-python, datasketches-postgresql, datasketches-bigquery and datasketches-characterization. In addition there are probably other platforms / languages that we may not know about that have code that depends on this base repo.
- Currently, the C++ code in this repo has been tested to run on C++ 11 through C++ 20. This means that there can be no code constructs that are exclusive to compiler versions greater than C++ 11.
- To update the base branch to require a minimum C++ compiler version greater than 11 would require verifying, and modifying as necessary, the code in these other repositories to make sure that they can compile from source and run.
- We would also want to notify the community, well in advance, that such a move is being contemplated and the README would need to be updated.
The New Sketch
- For a new sketch, we would like to see the website updated with a description of the sketch and a discussion about the performance and characteristics of this sketch and how it compares with the other quantile sketches already in the library. Adding characterization plots of speed and memory usage would also be highly desirable.
Please let us know if you are able to accommodate these requests.
|
Thank you for the detailed feedback @leerho . I understand the requirements for adding a new sketch to the repository, and I am happy to make the requested adjustments. For C++ version compatibility, I can update the implementation so that it is fully compliant with C++11. On the documentation and evaluation side, I have already implemented Python bindings (I haven't opened the PR in datasketches-python yet) and used them to run an initial comparison between I am also preparing a detailed description of the sketch that can be added to the website. This will cover the behavior of the sketch, its performance characteristics, and how it compares with the other quantile sketches in the library. Please let me know if there is anything else you would like me to include or adjust. |
Description
This PR implements DDSketch and closes #457.
The design follows the core ideas of the original DDSketch paper and the implementations in other libraries.
I have aimed to keep the API consistent with the existing sketches, especially the other quantile sketches.
There might still be some work to do (e.g., adding additional methods or refinements), but I’d like to hear your opinions on the design and implementation so far before proceeding further.
Implementation
This implementation currently requires C++20, primarily for the use of concepts and constrained templates.
DDSketch is templated over a Store concept, rather than relying on a common base class and virtual methods.
If
C++20adoption is not desired, I can rework the code to remain compatible withC++11by limiting the implementation to collapsing dense stores and removing concept-based templates. I’d appreciate the maintainers’ input on which direction to take before finalizing.