-
Notifications
You must be signed in to change notification settings - Fork 27
Rotation System Reconstruction: Collapse and Re-expansion #103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
5af53f1 to
5909cdf
Compare
e54879e to
5f247be
Compare
1f078a4 to
872afad
Compare
requires expressions go right after the template parameters. Putting it after the return type puts the requirement on the return type which makes no sense obviously.
This uses the less error prone std::reference_wrapper, fixes the incorrectly defined copy constructor, and adds [[nodiscard]] attributes to prevent misuse.
The old version of split_vertex that I wrote has some edge cases that ended up being triggered. This newer version covers those more thoroughly although in a somewhat inefficient way by inserting phantom triangles and removing them once the vertex is split. Although for our primary use case, this should be hit relatively infrequently.
QEM: The hopefully common case (three nonzero eigenvalues) should not have a negative assertion. Oops.
Coalesce Vec and Mat definitions Right now, inheritance is used to create a separate concrete type for each common combination of vector/mat and primitive type. The inherited types add no behavior besides public constructors. This leads to a lot of header bloat. This adds 3 extra headers but 21 headers are made redundant. A later commit can remove those and update other files that rely on those feathers to refer pull just Mat.h or Vec.h. Note that I think a lot more duplication can be reduced by flattening some unnecessary inheritance relationships. This will also slightly improve the debug experience
Moved it to the Geometry namespace not to pollute the top level namespace. Changed names to improve clarity. Added const accessors to make is_connected const.
872afad to
2bbb298
Compare
Vec.h: Add any and all utility templates Mat.h: Fix some documentation issues, mark Matrix types as final
This change seems to have a performance impact of up to 20%, which is simply astonishing. It is likely that GCC (and other compilers), simply do not understand what std::transform and similar functions do at a high level and completely fail to optimize these to efficient SIMD ops. The end result is a lot of useless runtime checked tiny loops that run useless integer ops and waste branch prediction slots.
2bbb298 to
8a5bae5
Compare
8a5bae5 to
5ad7c4a
Compare
Make InvalidIDs in AMGraph constexpr. This makes it somewhat easier to reference them from other files as they are no longer extern global statics. AMGraph: add neighbors_lazy As explained in janba#102, it would be nice to have an allocation-free variant of neighbors()
Create InplaceVector C++ has two main contiguous collections which are std::vector and std::array. std::vector provides a heap allocated growable array with a fast move operation while std::array is an in place fixed size wrapper over C-style arrays that handles object lifetimes correctly. While these two cover most use cases pretty adequately, there is the issue of small non-fixed size arrays. There are some cases where we need resizable arrays of a fairly small size. It is possible to emulate this if you have a sentinel element with std::array but I find this to be too error-prone. Falling back to std::vector comes with performance downsides. Although modern allocators are quite fast, having a lot of small allocations tend to waste a lot of heap space that can degrade performance over time. This becomes especially clear when you attempt to multithread existing code where previously alright allocator performance is suddenly degraded due to an increase in contention. People have tried to work around this by using alternative allocator like jemalloc however this is more of a crutch than a solution. The problem becomes more pronounced when we want to have a "vector of vectors". This is almost the worst case scenario for indirection with the CPU having little idea where the memory accesses will be but is also the worst case scenario for memory fragmentation which means worse performance due to cache misses and significantly increased memory usage. There are already use cases within GEL where creating a vector of 300,000+ vectors is not unusual, so there is a clear use case for this class. Note that this is not a "SmallVector" which is often used to indicate a growable array that can be conditionally stored in place below a minimum size but can also be moved to the heap beyond that size. If the library is ever updated to C++26, this class can likely be replaced using std::inplace_vector, which provides std::vector semantics with a fixed maximum capacity. InplaceVector: fix emplace_back Oops, fix empty add const qualified begin and end to InPlaceVector
Add SparseGraph inspired by the graph implementations inside less_slow.cpp I am not too happy about this one. The graph implementation that is most likely to outperform the built-in one would have to use flat_hash_set/flat_hash_map from the phmap library. The problem however is that those containers only return a single element when given equal_range which we want to abuse to store edges with the power of heterogeneous keys. This of course also means that the built in std::unordered_set and std::unordered_map are actually a multi set and a multi map which is horrifying. I will however consider keeping this implementation since I think it has some ergonomic benefits over the AMGraph and we might perhaps figure out some way to actually exploit its niche advantages. (It is also possible that the real reason for the performance deficit is merely that this collection only really performs great for sparse graphs and our connectivity is just a tad too high in RSR)
Add RangeTools.h I needed quite a few functional programming abstractions during the process of working on this project. Partially due to C++20 missing quite a few useful view adapters that are only added in C++23 and C++26. This is the initial commit that adds those range tools to their own header, although I will likely make some changes before actually merging this upstream. Move more things to RangeTools
Add Threadpool implementation These are threadpool implementations for parallel adapters that are used within the RSR algorithm. I decided to make this separate so I can easily swap out implementations which can be useful during profiling as multithreading can sometimes make results more ambiguous. In addition, we cannot use the built-in parallel algorithms. Not only is C++20 missing a good number of algorithms that we are interested in using (or have variants that are undesirable to use in some way), but one of the compilers that we wish to support (Apple Clang) does not support execution policies at all. Even when execution policies are supported, the C++ standard places no requirements on implementors to actually execute relevant algorithms in parallel when parallel execution is requested. The ThreadPool class seems to still contain a subtle race condition where the pool can deadlock if the given closure finishes really quickly. I initially wanted to use this design because it amortizes thread creation costs but I might end up just removing it as our algorithms take long enough that thread creation is typically completely negligible. Add Parallel Adapters Adds concepts and functions for parallel algorithms. These add parallel algorithms comparable to ones available in the algorithm header. Note that execution policy is still not supported on Apple Clang right now (2025-07-22) which just so happens to be the compiler Andreas uses. This library is much stricter than the algorithms one in the sense that we require the input and output ranges to be contiguous and have constant time random access. While there are more generic ways of writing parallel algorithms, cache locality can lose us more performance (>10x) than what we gain thanks to multithreading. Also note that the names follow functional programming concepts rather than the C++ std names (map, filter instead of transform, erase_if) since the choice of some of the std names were picked to either avoid collisions with existing portions of the library or follow some specific C++ idiom which we don't necessarily care about. It is also worth noting that there is quite a bit of duplication here to express some more complex adapters that would normally be composed in more functional code. This is because we need to block to force computations to complete. Allowing composition without blocking would require a Rust style library where computation is indirectly described in terms of traits and an explicit collect is added to the end to finish the computation. It is possible to write this in C++ but it is overly complex and most of the computation that we are doing can be described in terms of a simple map-fold. Note that I might also turn IExecutor into a template parameter to allow the compiler to inline one more level although I suspect the impact of thi son performance is negligible. Add ThreadPool tests ParallelAdapters: Use proper cache line size ParallelAdapters: fix FilterFunctor concept ParallelAdapters: 64 is the cache line size for me and you ♫
Add RawObj class and parsers There are several ways of parsing obj files in GEL right now. None of which fit my particular needs since these are meant to be used to feed TriMesh or Manifold. I need a parser that just provides the raw data and the ad hoc ones that are spread all across the demo files won't do. Note that we will almost certainly want to move this to the test directory eventually. It would be preferable if outside users would not see yet another obj parser and parser tend to be security landmines. Add Obj parser tests Not comprehensive tests for the parser. Likely to be removed soon. RawObj now supports faces RawObj: Fix a face parsing bug, properly handle no normals case RawObj: Due to MSVC lacking the relevant overloads wrt from_chars and string_view, we have to use this awkward juggling of data() and length() instead of begin() and end(). RawObj handles an edge case There are some cases where the obj file has no faces but still has normals which are equal in number to the vertices. We can then map each normal directly to one vertex.
Add working copy of RsR.h/RsR.cpp Working copies of the RsR algorithms for multithreading and collapse/expansion. Nowhere near finished. RSR: Fix collapse overflow bug RsR: refactor to get rid of the convoluted branching for knn_search RsR2.cpp: Use shortest edge based collapse This switches from naive nearest neighbor collapse to naive shortest edge based collapse and I regret not making this change earlier. Not only is this method of collapse much simpler since I have an exact order to perform the collapse and don't have to worry about keeping track of which points are allowed to collapse, since we are selecting based on edges instead of points, I end up selecting both points at the same time. There remains questions however. We select an edge, but there is no obvious way to choose which point is active and which is latent. Should we update the point by averaging it afterwards? If so, then perhaps we should also keep track of point weights so we know how heavy each point is when doing future averaging. If we end up changing point positions, we then have to presumably store the intermediate coordinates. Finally, we need to consider that whatever we do needs to be parallelizable (assuming it is worth parallelizing). If we can do this right and maintain quality after collapses, then the main algorithm will presumably benefit relatively little from parallelization and this part will likely be much more important. RsR2.cpp, RsR2.h: Extended cleanup This performs further dead code elimination and small scale optimizations across these two files. Write separate globals for InvalidNodeID and InvalidEdgeID. (these should be made constexpr later) Parametrize usages of set, unordered_set, and unordered_map. If we ever change from using std containers to a third party one, this makes the change easier to do. The opportunity for performance gains are quite large just from this (>30%). Eliminate dead fields from Vertex and Vertex::Neighbor. Many of these are either write only or their write result is never used. Eliminate the unused API of RSGraph. A copy of it already exists in RsR.h just in case. Eliminate the branch in RSGraph::add_edge. Previously this branch was sometimes taken but ignored (returning an InvalidEdgeID). This was apparently fixed in 925bbda by Andreas. Remove API for remove_duplicate_vertices. Same reason as 1015f691f4c93049162afbe14556a73e7edf3f8b. Remove unnecessary forward declarations in RsR2.cpp. Change naming to be more consistent. Merge knn_search to one function. Simplify nn_search and fix an apparent bug I've created where I did not return a properly sorted version. Write a generic minimum_spanning_tree implementation. This is used two times and it was trivial to factor it out with a functional API. I would not be surprised if there are other parts in the library that also calculate MSTs and we can just later move this so other people can calculate an MST easily. Change the map used in connect_handle not to use a string. remove sets_temp.reserve() built-in sets don't have reserve. Add separate timer for RSR This is just a split of the timer in the original RsR header with some API improvements. There is yet another timer in GEL that seems to be used in a few places (mainly in the demos) that should probably be removed as the built-in chrono in C++ is more than good enough and the other timer has some pretty nasty ifdefs. RsR2.h: Changes Remove <random>. It is not even used. Remove Neighbor related structs. They are moved to Collapse.h. Remove Boolean. It is better it sits next to the only place it is used. Change Vertex::normal_rep. This is another narrowing conversion. Annoyingly, it has two special states (-1 and -2), one of which is a valid state for the NodeID which also inhabits it. Although this is unlikely to cause any real issues, I don't know what to do about it right now. Remove dead RSGraph fields. total_edge_length is never read from inside the code. Maybe it was used for testing at some point. is_euclidean and is_final are better expressed as function arguments, but as it turned out. is_final was completely dead. current_no_edges can be figured out by other means but as it turns out, it is also never read. Remove dead RSGraph function init. It is supposedly called in one place where it was commented out. It might be better to extend AttributeVector API to have a reserve function later. Remove collapse_points API. It is moved to Collapse.h. RsR2.cpp: Changes Reorder arguments in find_common_neighbor. Most functions take RSGraph as the first argument. Remove dead helper functions erase_swap, erase_if_unstable, safe_div. Move knn_search, build_kd_tree_of_indices, calculate_neighbors, collapse_points to Collapse.h Make more stuff const. Move Boolean wrapper inside correct_normal_orientation. Eliminate multiple dead std::vector instances. Make is_intersecting and more compact. Remove commented out check_face_overlap examples. Give explore is_euclidean as an argument instead of a field in G. Make check_branch_validity more compact. In addition, factor out is_final field in RSGraph, it is only ever called with true. As it turns out, the entirety of check_and_force is dead code. Remove check_and_force. Use the constant vert_count overload of build_manifold inside component_to_manifold. RSR2.h: Changes Remove unnecessary <ranges> Move successor and predecessor functions inside RSGraph. Also, see issue janba#105. RsR2.cpp: Changes Move successor and predecessor to RsR2.h. Also update usages so they use the member function in RSGraph. Use a type definition for ThreadPool so I can easily change the desired implementation for each usage. It might a better idea to have the executors be propagated down rather than re-created but with the current execution times, I don't really see anything besides ImmediatePool being worth using. register_face: make the function more compact. connect_handle: style improvements. triangulate: eliminate dead variables RsR2.h: add crash assertions RsR2.cpp: Add crash assertions Move unnecessary type definitions to RsR2.cpp, remove using namespace declarations, add some type definitions inside detail. Move type definitions inside RsR2.cpp Simplify cal_radians_3d Fix rest of typedefinitions, style improvements Add finer timer to component_to_manifold Edge arrays use a single vector Normal estimation does not create intermediate vectors register_face style improvements Style improvements RsR2: make a few things constexpr RsR2: Minor fixes RsR2: Changes Move neighbor calculation and cross connection filtering out of init_graph. init_graph now returns the graph it initializes directly. Make the inner loop a little more sensible. Rename generic mst builder to make_mst Factor out RSGraph::exp_genus to be a function argument Move neighbor calculation out of split_components. Remove completely pointless inner loop in graph construction. Make usages a little more clear in point_cloud_to_mesh RsR2.h apply changes in previous commit RsR2.h reshuffle things around a bit RsR2.cpp: Changes Move most using declarations to the top. Use a common comment style. Eliminate one unnecessary allocation in nn_search Factor out max_length and pre_max_length in init_graph to a single vector. Eliminate two dead arguments to find_shortest_path weighted_smooth is now alloc free Replace call to neighbors with neighbors_lazy RsR2.cpp: remove triangle_mean_normal remove constexpr from normalize_normals revert vertex constructor change (Clang) RsR2.h Use phmap associative containers Add access assertions to tree_id Update point_cloud_collapse_reexpand signature to have CollapseOpts RsR2.cpp Use lazy neighbor iterators Add more input sanitization Update point_cloud_collapse_reexpand signature RsR2.cpp: I want to split the data and behavior inside the Collapse header but I can't do that right now, so we will insert some temporary changes here and fix up the API of collapse_points and reexpand_points later RsR2.h: Switch to SparseGraph This actually causes a performance regression but I actually want to better encapsulate the two graphs inside RsR RsR2.cpp: Switch to SparseGraph This actually causes a performance regression but I actually want to better encapsulate the two graphs inside RsR RsR2: Many improvements Make SimpGraph no longer derived. (I'm not so sure about this change and might revert it since we have to expose the inner part anyway). Instead of using AttribVec, use std::vector in SimpGraph and RSGraph. This results in a major performance improvement. Revert SimpGraph and RSGraph to use AMGraph. The SparseGraph idea wasn't bad but we have too many edges for it to be anywhere near performant. Move get_neighbor_info and maintain_edge_loop inside RSGraph. That's where they make sense. Convert a few pairs to dedicated structs to avoid potential memcpy fails. Convert a lot of values to references to avoid copying. As it turns out this was one of our biggest performance losses because each Vec3 is 24 bytes and certain functions are copying quite a few of them. In this case, inlining becomes undesirable because the indirect memory access will nearly always beat copying so much data to the stack over and over. Add some experimental multithreading to edge connection. RsR2.cpp: Merge four allocations into one in connect_handle, move a function to RangeTools Revert constructor deletions in RSRTimer so I can more easily drop this into the old RsR.cpp Remove failed reserve attempt, make AMGraph privately inherited in RSGraph Change RsR2.cpp to use the intended collapse API RsR2: Revert several changes to fix some regressions The main things are, the imp_node vector is not populated correctly in connect_handle and cross connection filtering early results in the pre_max_length being incorrect. These do not explain the entirety of the regressions. Remove incorrect constexpr RsR.cpp Update the interface to handle normal and no normal situations correctly RsR2: take a full option struct for reexpansion options RsR2: eliminate unnecessary NodeID from Vertex struct RsR2: simplify geometry check, add debug functions Oops, forgot to clear neighbors Even more fixes to RSR2
This adds some of the collapse-reexpansion functionality as a separate header. Likely not hugely desirable, but RsR2.cpp has over 2000 lines and CLion does not like analyzing it. If I can get rid of some of the unwanted repetition, I will consider merging this back to RsR2.h/RsR2.cpp. Collapse: add numbers import so we can pull pi Collapse.h: Add thresholding for angles and valencies to reexpansion As Andreas pointed out, it is most likely better to use thresholds to limit triangles with more extreme minimum angles and vertices with too few way too many neighbors. When set very conservatively, this seems to result in a slight improvement in reconstruction quality. Collapse.h: Changes Move Collapse types and functions into RSR. They will only be used there. Move collapse_neighbors from RsR2.cpp to here. Technically, this means we need to move a few things that are not collapse related (mainly the neighbor calculations) but this means that all Collapse-related code (except for the API) is in Collapse.h. I will likely further move stuff into its own compilation unit. Collapse.h: Make Clang shut up It seems that Util::Range returns a i64 regardless which is probably not what we want. Since I will most likely remove that in exchange for iota_view (despite the apparent downsides) This will likely not be a problem later. Use Vec3 type definition Collapse: implement fundamentals for QEM Collapse: formatting Collapse.h Use QEMs for collapse and reexpansion Collapse.h: a lot of changes Add CollapseOpts to pass some parameters more easily. Add zip_view_t to make easy pairs. TODO: move this elsewhere. Add zip helper template for zip_view_t Add cartesian_product helper template. Add normal lerp functions. Fix a fairly nasty bug in QuadraticCollapseGraph that resulted in horrible collapses. Experimental shared_neighbors normalization in QEM (Currently seems to make things worse). Some temporary changes to Collapse while I figure out how to properly pass this data around. Add a struct that just contains the collapse information (preferred thing to pass to the second function). Add a helper function to convert the original collapse struct to this. Create another helper function to create artificial collapses. optimize_dihedral might have some issues? I need to check we are actually getting a proper value. Use a split struct to be more explicit. Use a triangle struct instead of trying to manage triangles manually. I don't care that this is 72 bytes, the previous method was a complete nightmare. find_edge_pair tries to find the actual one_ring and two_ring dihedrals now. I also added a bunch of debug printing that I will most likely remove but I want to leave it in the commit history just in case. Add decimate and decimate_reexpand functions to investigate results based on a collapse directly from the Manifold instead of relying on the more noisy results from the RSR. Split Collapse.h to Collapse.cpp Move even more stuff to Collapse.cpp Save the futile alternative edge flip based reconstruction method in Collapse.cpp I will most likely just remove this immediately but I want to have a reference here just in case I want to come back to it later Collapse.cpp: I finally figured out how to get OK performance, so save here so I can make it actually fast Collapse.cpp Perform many triangle optimizations to reduce amount of work during a reexpansion Collapse: remove the fat Collapse class Collapse: early return for 0 max iterations Collapse.h Also return indices for the point cloud Collapse: Replace MutablePriorityQueue with a simpler BTree based design that is much faster Major changes to Collapse Add debug options for easier debugging Bring back the old somewhat inefficient QuadraticCollapseGraph based on the MutablePriorityQueue. The simpler BTree based version should still function well hypothetically but until I figure things out, both can live here. Clamp the minimum error for QEMs to a numerically stable value, so at very low QEM values, distance will dominate. Angle optimization multiplies by the edge length that the angle is facing. This is somewhat problematic but it avoids the problem that the angle scoring being unaffected by the edge length while dihedral angles were. So I no longer have to worry about applying an awkward normalization factor when balancing them out. Add one_ring_max_angle for debugging by figuring out if we are creating acute angles. I used 1 for debug thresholding but it seems like there are cases where such acute angles are not unreasonable, although we will have to discuss this during a meeting. Take into account angles during split selection. Return the actual maximum dihedral angle rather than the edge adjusted maximum dihedral angle. Store two sets of potential splits. One based on just our scoring, and the other that avoids any acute (>90 deg) angles. Pick the non-acute angle if possible. Refactor the edge crossing check to shoot_ray_and_maybe_flip_edge (better name pending). What this now does is that it checks if we are crossing or getting close to cross a plate around a particular edge. I might want to set the "about to cross" threshold as a potential out parameter. Make collapse and reexpansion work. There, I said it. Final behavorial changes to Collapse and Reexpansion
Add RsR tests Move KDTree builder to test file RSR: Fix collapse overflow bug RSR_test.cpp: Remove duplicates testing Since I have figured out that duplicates are checked in multiple places in the original reconstruction, explicitly removing duplicates have become unnecessary. In addition, I didn't like this oracle testing method either. Collapse API is in HMesh::RSR now. Better reconstruct_debug Add RsR tests to CMakeLists.txt RsR_test.cpp: Changes Remove unused create_rectangular_manifold Minor fixes. Pass old reconstruct_single the pre-generated normals. Collapse: implement fundamentals for QEM RsR_test.cpp Comment outdated tests, adapt existing tests for the new signatures RsR_test.cpp Add rectangular tests and a bunch of other nonsense RsR_test: Split include_directories again RsR_test: Remove unnecessary RangeTools and SparseGraph tests Update RsR_test to reflect the latest changes Make RsR_test compile again
Add parallel-hashmap to CMakeLists.txt. See issue janba#102. Add AssociativeContainers.h Adds type definitions for types used in the parallel_hashmap. See janba#102 for context. But due to the slow nature of the built in associative containers, I think that this is a highly beneficial addition. Adds a mutable priority queue as an alternative to generational priority queues. This class generalizes a priority queue by allowing key ordering relations to be stored indirectly This means that keys can be removed arbitrarily and not just from the front. Existing keys can be updated by reinserting them to the queue. Note that the memory footprint of this is substantially larger than an ordinary vector-based min/max heap. In addition, the internal queue is always kept sorted, meaning there is a potentially higher up-front cost to constructing this. So make sure you really need the mutability or really benefit from it when using this over a generational priority queue. Add missing assertion import and size() and empty() Add multimap and multiset to associative containers
See issue janba#107 Add random library
5ad7c4a to
5161624
Compare
Contributor
Author
|
Supplanted by #139 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
My previous working tree got a bit out of hand (there are over 100 commits that I just don't want to rebase). So, I will create this draft for now to later clean up and merge.
Clean up todo: