Skip to content

Conversation

@cakarsubasi
Copy link
Contributor

@cakarsubasi cakarsubasi commented Jul 27, 2025

My previous working tree got a bit out of hand (there are over 100 commits that I just don't want to rebase). So, I will create this draft for now to later clean up and merge.

Clean up todo:

@cakarsubasi cakarsubasi force-pushed the feature/rsr-reexpand branch 2 times, most recently from 5af53f1 to 5909cdf Compare August 1, 2025 11:23
@cakarsubasi cakarsubasi force-pushed the feature/rsr-reexpand branch 2 times, most recently from e54879e to 5f247be Compare August 13, 2025 11:01
requires expressions go right after the template parameters. Putting it after the return type puts the requirement on the return type which makes no sense obviously.
This uses the less error prone std::reference_wrapper, fixes the incorrectly defined copy constructor, and adds [[nodiscard]] attributes to prevent misuse.
The old version of split_vertex that I wrote has some edge cases that ended up being triggered. This newer version covers those more thoroughly although in a somewhat inefficient way by inserting phantom triangles and removing them once the vertex is split. Although for our primary use case, this should be hit relatively infrequently.
QEM: The hopefully common case (three nonzero eigenvalues) should not have a negative assertion. Oops.
Coalesce Vec and Mat definitions

Right now, inheritance is used to create a separate concrete type for each common combination of vector/mat and primitive type. The inherited types add no behavior besides public constructors. This leads to a lot of header bloat. This adds 3 extra headers but 21 headers are made redundant. A later commit can remove those and update other files that rely on those feathers to refer pull just Mat.h or Vec.h.

Note that I think a lot more duplication can be reduced by flattening some unnecessary inheritance relationships.

This will also slightly improve the debug experience
Moved it to the Geometry namespace not to pollute the top level namespace.

Changed names to improve clarity.

Added const accessors to make is_connected const.
Vec.h: Add any and all utility templates

Mat.h: Fix some documentation issues, mark Matrix types as final
This change seems to have a performance impact of up to 20%, which is simply astonishing. It is likely that GCC (and other compilers), simply do not understand what std::transform and similar functions do at a high level and completely fail to optimize these to efficient SIMD ops. The end result is a lot of useless runtime checked tiny loops that run useless integer ops and waste branch prediction slots.
Make InvalidIDs in AMGraph constexpr.

This makes it somewhat easier to reference them from other files as they are no longer extern global statics.

AMGraph: add neighbors_lazy

As explained in janba#102, it would be nice to have an allocation-free variant of neighbors()
Create InplaceVector

C++ has two main contiguous collections which are std::vector and std::array. std::vector provides a heap allocated growable array with a fast move operation while std::array is an in place fixed size wrapper over C-style arrays that handles object lifetimes correctly. While these two cover most use cases pretty adequately, there is the issue of small non-fixed size arrays. There are some cases where we need resizable arrays of a fairly small size. It is possible to emulate this if you have a sentinel element with std::array but I find this to be too error-prone. Falling back to std::vector comes with performance downsides. Although modern allocators are quite fast, having a lot of small allocations tend to waste a lot of heap space that can degrade performance over time. This becomes especially clear when you attempt to multithread existing code where previously alright allocator performance is suddenly degraded due to an increase in contention. People have tried to work around this by using alternative allocator like jemalloc however this is more of a crutch than a solution. The problem becomes more pronounced when we want to have a "vector of vectors". This is almost the worst case scenario for indirection with the CPU having little idea where the memory accesses will be but is also the worst case scenario for memory fragmentation which means worse performance due to cache misses and significantly increased memory usage. There are already use cases within GEL where creating a vector of 300,000+ vectors is not unusual, so there is a clear use case for this class.

Note that this is not a "SmallVector" which is often used to indicate a growable array that can be conditionally stored in place below a minimum size but can also be moved to the heap beyond that size.

If the library is ever updated to C++26, this class can likely be replaced using std::inplace_vector, which provides std::vector semantics with a fixed maximum capacity.

InplaceVector: fix emplace_back

Oops, fix empty

add const qualified begin and end to InPlaceVector
Add SparseGraph inspired by the graph implementations inside less_slow.cpp

I am not too happy about this one. The graph implementation that is most likely to outperform the built-in one would have to use flat_hash_set/flat_hash_map from the phmap library. The problem however is that those containers only return a single element when given equal_range which we want to abuse to store edges with the power of heterogeneous keys. This of course also means that the built in std::unordered_set and std::unordered_map are actually a multi set and a multi map which is horrifying.

I will however consider keeping this implementation since I think it has some ergonomic benefits over the AMGraph and we might perhaps figure out some way to actually exploit its niche advantages.

(It is also possible that the real reason for the performance deficit is merely that this collection only really performs great for sparse graphs and our connectivity is just a tad too high in RSR)
Add RangeTools.h

I needed quite a few functional programming abstractions during the process of working on this project. Partially due to C++20 missing quite a few useful view adapters that are only added in C++23 and C++26.

This is the initial commit that adds those range tools to their own header, although I will likely make some changes before actually merging this upstream.

Move more things to RangeTools
Add Threadpool implementation

These are threadpool implementations for parallel adapters that are used within the RSR algorithm. I decided to make this separate so I can easily swap out implementations which can be useful during profiling as multithreading can sometimes make results more ambiguous. In addition, we cannot use the built-in parallel algorithms. Not only is C++20 missing a good number of algorithms that we are interested in using (or have variants that are undesirable to use in some way), but one of the compilers that we wish to support (Apple Clang) does not support execution policies at all. Even when execution policies are supported, the C++ standard places no requirements on implementors to actually execute relevant algorithms in parallel when parallel execution is requested.

The ThreadPool class seems to still contain a subtle race condition where the pool can deadlock if the given closure finishes really quickly. I initially wanted to use this design because it amortizes thread creation costs but I might end up just removing it as our algorithms take long enough that thread creation is typically completely negligible.

Add Parallel Adapters

Adds concepts and functions for parallel algorithms.

These add parallel algorithms comparable to ones available in the algorithm header. Note that execution policy is still not supported on Apple Clang right now (2025-07-22) which just so happens to be the compiler Andreas uses.

This library is much stricter than the algorithms one in the sense that we require the input and output ranges to be contiguous and have constant time random access. While there are more generic ways of writing parallel algorithms, cache locality can lose us more performance (>10x) than what we gain thanks to multithreading.

Also note that the names follow functional programming concepts rather than the C++ std names (map, filter instead of transform, erase_if) since the choice of some of the std names were picked to either avoid collisions with existing portions of the library or follow some specific C++ idiom which we don't necessarily care about.

It is also worth noting that there is quite a bit of duplication here to express some more complex adapters that would normally be composed in more functional code. This is because we need to block to force computations to complete. Allowing composition without blocking would require a Rust style library where computation is indirectly described in terms of traits and an explicit collect is added to the end to finish the computation. It is possible to write this in C++ but it is overly complex and most of the computation that we are doing can be described in terms of a simple map-fold.

Note that I might also turn IExecutor into a template parameter to allow the compiler to inline one more level although I suspect the impact of thi son performance is negligible.

Add ThreadPool tests

ParallelAdapters:

Use proper cache line size

ParallelAdapters: fix FilterFunctor concept

ParallelAdapters: 64 is the cache line size for me and you ♫
Add RawObj class and parsers

There are several ways of parsing obj files in GEL right now. None of which fit my particular needs since these are meant to be used to feed TriMesh or Manifold. I need a parser that just provides the raw data and the ad hoc ones that are spread all across the demo files won't do.

Note that we will almost certainly want to move this to the test directory eventually. It would be preferable if outside users would not see yet another obj parser and parser tend to be security landmines.

Add Obj parser tests

Not comprehensive tests for the parser. Likely to be removed soon.

RawObj now supports faces

RawObj: Fix a face parsing bug, properly handle no normals case

RawObj: Due to MSVC lacking the relevant overloads wrt from_chars and string_view, we have to use this awkward juggling of data() and length() instead of begin() and end().

RawObj handles an edge case

There are some cases where the obj file has no faces but still has normals which are equal in number to the vertices. We can then map each normal directly to one vertex.
Add working copy of RsR.h/RsR.cpp

Working copies of the RsR algorithms for multithreading and collapse/expansion. Nowhere near finished.

RSR: Fix collapse overflow bug

RsR: refactor to get rid of the convoluted branching for knn_search

RsR2.cpp: Use shortest edge based collapse

This switches from naive nearest neighbor collapse to naive shortest edge based collapse and I regret not making this change earlier. Not only is this method of collapse much simpler since I have an exact order to perform the collapse and don't have to worry about keeping track of which points are allowed to collapse, since we are selecting based on edges instead of points, I end up selecting both points at the same time.

There remains questions however. We select an edge, but there is no obvious way to choose which point is active and which is latent. Should we update the point by averaging it afterwards? If so, then perhaps we should also keep track of point weights so we know how heavy each point is when doing future averaging. If we end up changing point positions, we then have to presumably store the intermediate coordinates. Finally, we need to consider that whatever we do needs to be parallelizable (assuming it is worth parallelizing). If we can do this right and maintain quality after collapses, then the main algorithm will presumably benefit relatively little from parallelization and this part will likely be much more important.

RsR2.cpp, RsR2.h: Extended cleanup

This performs further dead code elimination and small scale optimizations across these two files.

Write separate globals for InvalidNodeID and InvalidEdgeID. (these should be made constexpr later)

Parametrize usages of set, unordered_set, and unordered_map. If we ever change from using std containers to a third party one, this makes the change easier to do. The opportunity for performance gains are quite large just from this (>30%).

Eliminate dead fields from Vertex and Vertex::Neighbor. Many of these are either write only or their write result is never used.

Eliminate the unused API of RSGraph. A copy of it already exists in RsR.h just in case.

Eliminate the branch in RSGraph::add_edge. Previously this branch was sometimes taken but ignored (returning an InvalidEdgeID). This was apparently fixed in 925bbda by Andreas.

Remove API for remove_duplicate_vertices. Same reason as 1015f691f4c93049162afbe14556a73e7edf3f8b.

Remove unnecessary forward declarations in RsR2.cpp.

Change naming to be more consistent.

Merge knn_search to one function.

Simplify nn_search and fix an apparent bug I've created where I did not return a properly sorted version.

Write a generic minimum_spanning_tree implementation. This is used two times and it was trivial to factor it out with a functional API. I would not be surprised if there are other parts in the library that also calculate MSTs and we can just later move this so other people can calculate an MST easily.

Change the map used in connect_handle not to use a string.

remove sets_temp.reserve()

built-in sets don't have reserve.

Add separate timer for RSR

This is just a split of the timer in the original RsR header with some API improvements. There is yet another timer in GEL that seems to be used in a few places (mainly in the demos) that should probably be removed as the built-in chrono in C++ is more than good enough and the other timer has some pretty nasty ifdefs.

RsR2.h: Changes

Remove <random>. It is not even used.

Remove Neighbor related structs. They are moved to Collapse.h.

Remove Boolean. It is better it sits next to the only place it is used.

Change Vertex::normal_rep. This is another narrowing conversion. Annoyingly, it has two special states (-1 and -2), one of which is a valid state for the NodeID which also inhabits it. Although this is unlikely to cause any real issues, I don't know what to do about it right now.

Remove dead RSGraph fields. total_edge_length is never read from inside the code. Maybe it was used for testing at some point. is_euclidean and is_final are better expressed as function arguments, but as it turned out. is_final was completely dead. current_no_edges can be figured out by other means but as it turns out, it is also never read.

Remove dead RSGraph function init. It is supposedly called in one place where it was commented out. It might be better to extend AttributeVector API to have a reserve function later.

Remove collapse_points API. It is moved to Collapse.h.

RsR2.cpp: Changes

Reorder arguments in find_common_neighbor. Most functions take RSGraph as the first argument.

Remove dead helper functions erase_swap, erase_if_unstable, safe_div.

Move knn_search, build_kd_tree_of_indices, calculate_neighbors, collapse_points to Collapse.h

Make more stuff const.

Move Boolean wrapper inside correct_normal_orientation.

Eliminate multiple dead std::vector instances.

Make is_intersecting and more compact.

Remove commented out check_face_overlap examples.

Give explore is_euclidean as an argument instead of a field in G.

Make check_branch_validity more compact. In addition, factor out is_final field in RSGraph, it is only ever called with true.

As it turns out, the entirety of check_and_force is dead code. Remove check_and_force.

Use the constant vert_count overload of build_manifold inside component_to_manifold.

RSR2.h: Changes

Remove unnecessary <ranges>

Move successor and predecessor functions inside RSGraph. Also, see issue janba#105.

RsR2.cpp: Changes

Move successor and predecessor to RsR2.h. Also update usages so they use the member function in RSGraph.

Use a type definition for ThreadPool so I can easily change the desired implementation for each usage. It might a better idea to have the executors be propagated down rather than re-created but with the current execution times, I don't really see anything besides ImmediatePool being worth using.

register_face: make the function more compact.

connect_handle: style improvements.

triangulate: eliminate dead variables

RsR2.h: add crash assertions

RsR2.cpp: Add crash assertions

Move unnecessary type definitions to RsR2.cpp, remove using namespace declarations, add some type definitions inside detail.

Move type definitions inside RsR2.cpp

Simplify cal_radians_3d

Fix rest of typedefinitions, style improvements

Add finer timer to component_to_manifold

Edge arrays use a single vector

Normal estimation does not create intermediate vectors

register_face style improvements

Style improvements

RsR2: make a few things constexpr

RsR2: Minor fixes

RsR2: Changes

Move neighbor calculation and cross connection filtering out of init_graph. init_graph now returns the graph it initializes directly. Make the inner loop a little more sensible.

Rename generic mst builder to make_mst

Factor out RSGraph::exp_genus to be a function argument

Move neighbor calculation out of split_components. Remove completely pointless inner loop in graph construction.

Make usages a little more clear in point_cloud_to_mesh

RsR2.h apply changes in previous commit

RsR2.h reshuffle things around a bit

RsR2.cpp: Changes

Move most using declarations to the top.

Use a common comment style.

Eliminate one unnecessary allocation in nn_search

Factor out max_length and pre_max_length in init_graph to a single vector.

Eliminate two dead arguments to find_shortest_path

weighted_smooth is now alloc free

Replace call to neighbors with neighbors_lazy

RsR2.cpp: remove triangle_mean_normal

remove constexpr from normalize_normals

revert vertex constructor change (Clang)

RsR2.h

Use phmap associative containers

Add access assertions to tree_id

Update point_cloud_collapse_reexpand signature to have CollapseOpts

RsR2.cpp

Use lazy neighbor iterators

Add more input sanitization

Update point_cloud_collapse_reexpand signature

RsR2.cpp:

I want to split the data and behavior inside the Collapse header but I can't do that right now, so we will insert some temporary changes here and fix up the API of collapse_points and reexpand_points later

RsR2.h: Switch to SparseGraph

This actually causes a performance regression but I actually want to better encapsulate the two graphs inside RsR

RsR2.cpp: Switch to SparseGraph

This actually causes a performance regression but I actually want to better encapsulate the two graphs inside RsR

RsR2: Many improvements

Make SimpGraph no longer derived. (I'm not so sure about this change and might revert it since we have to expose the inner part anyway).

Instead of using AttribVec, use std::vector in SimpGraph and RSGraph. This results in a major performance improvement.

Revert SimpGraph and RSGraph to use AMGraph. The SparseGraph idea wasn't bad but we have too many edges for it to be anywhere near performant.

Move get_neighbor_info and maintain_edge_loop inside RSGraph. That's where they make sense.

Convert a few pairs to dedicated structs to avoid potential memcpy fails.

Convert a lot of values to references to avoid copying. As it turns out this was one of our biggest performance losses because each Vec3 is 24 bytes and certain functions are copying quite a few of them. In this case, inlining becomes undesirable because the indirect memory access will nearly always beat copying so much data to the stack over and over.

Add some experimental multithreading to edge connection.

RsR2.cpp:

Merge four allocations into one in connect_handle, move a function to RangeTools

Revert constructor deletions in RSRTimer so I can more easily drop this into the old RsR.cpp

Remove failed reserve attempt, make AMGraph privately inherited in RSGraph

Change RsR2.cpp to use the intended collapse API

RsR2: Revert several changes to fix some regressions

The main things are, the imp_node vector is not populated correctly in connect_handle and cross connection filtering early results in the pre_max_length being incorrect. These do not explain the entirety of the regressions.

Remove incorrect constexpr

RsR.cpp Update the interface to handle normal and no normal situations correctly

RsR2: take a full option struct for reexpansion options

RsR2: eliminate unnecessary NodeID from Vertex struct

RsR2: simplify geometry check, add debug functions

Oops, forgot to clear neighbors

Even more fixes to RSR2
This adds some of the collapse-reexpansion functionality as a separate header. Likely not hugely desirable, but RsR2.cpp has over 2000 lines and CLion does not like analyzing it. If I can get rid of some of the unwanted repetition, I will consider merging this back to RsR2.h/RsR2.cpp.

Collapse: add numbers import so we can pull pi

Collapse.h: Add thresholding for angles and valencies to reexpansion

As Andreas pointed out, it is most likely better to use thresholds to limit triangles with more extreme minimum angles and vertices with too few way too many neighbors. When set very conservatively, this seems to result in a slight improvement in reconstruction quality.

Collapse.h: Changes

Move Collapse types and functions into RSR. They will only be used there.

Move collapse_neighbors from RsR2.cpp to here. Technically, this means we need to move a few things that are not collapse related (mainly the neighbor calculations) but this means that all Collapse-related code (except for the API) is in Collapse.h. I will likely further move stuff into its own compilation unit.

Collapse.h: Make Clang shut up

It seems that Util::Range returns a i64 regardless which is probably not what we want. Since I will most likely remove that in exchange for iota_view (despite the apparent downsides) This will likely not be a problem later.

Use Vec3 type definition

Collapse: implement fundamentals for QEM

Collapse: formatting

Collapse.h

Use QEMs for collapse and reexpansion

Collapse.h: a lot of changes

Add CollapseOpts to pass some parameters more easily.

Add zip_view_t to make easy pairs. TODO: move this elsewhere.

Add zip helper template for zip_view_t

Add cartesian_product helper template.

Add normal lerp functions.

Fix a fairly nasty bug in QuadraticCollapseGraph that resulted in horrible collapses.

Experimental shared_neighbors normalization in QEM (Currently seems to make things worse).

Some temporary changes to Collapse while I figure out how to properly pass this data around. Add a struct that just contains the collapse information (preferred thing to pass to the second function). Add a helper function to convert the original collapse struct to this. Create another helper function to create artificial collapses.

optimize_dihedral might have some issues? I need to check we are actually getting a proper value.

Use a split struct to be more explicit.

Use a triangle struct instead of trying to manage triangles manually. I don't care that this is 72 bytes, the previous method was a complete nightmare.

find_edge_pair tries to find the actual one_ring and two_ring dihedrals now. I also added a bunch of debug printing that I will most likely remove but I want to leave it in the commit history just in case.

Add decimate and decimate_reexpand functions to investigate results based on a collapse directly from the Manifold instead of relying on the more noisy results from the RSR.

Split Collapse.h to Collapse.cpp

Move even more stuff to Collapse.cpp

Save the futile alternative edge flip based reconstruction method in Collapse.cpp

I will most likely just remove this immediately but I want to have a reference here just in case I want to come back to it later

Collapse.cpp:

I finally figured out how to get OK performance, so save here so I can make it actually fast

Collapse.cpp

Perform many triangle optimizations to reduce amount of work during a reexpansion

Collapse: remove the fat Collapse class

Collapse: early return for 0 max iterations

Collapse.h
Also return indices for the point cloud

Collapse: Replace MutablePriorityQueue with a simpler BTree based design that is much faster

Major changes to Collapse

Add debug options for easier debugging

Bring back the old somewhat inefficient QuadraticCollapseGraph based on the MutablePriorityQueue. The simpler BTree based version should still function well hypothetically but until I figure things out, both can live here.

Clamp the minimum error for QEMs to a numerically stable value, so at very low QEM values, distance will dominate.

Angle optimization multiplies by the edge length that the angle is facing. This is somewhat problematic but it avoids the problem that the angle scoring being unaffected by the edge length while dihedral angles were. So I no longer have to worry about applying an awkward normalization factor when balancing them out.

Add one_ring_max_angle for debugging by figuring out if we are creating acute angles. I used 1 for debug thresholding but it seems like there are cases where such acute angles are not unreasonable, although we will have to discuss this during a meeting.

Take into account angles during split selection.

Return the actual maximum dihedral angle rather than the edge adjusted maximum dihedral angle. Store two sets of potential splits. One based on just our scoring, and the other that avoids any acute (>90 deg) angles. Pick the non-acute angle if possible.

Refactor the edge crossing check to shoot_ray_and_maybe_flip_edge (better name pending). What this now does is that it checks if we are crossing or getting close to cross a plate around a particular edge. I might want to set the "about to cross" threshold as a potential out parameter.

Make collapse and reexpansion work.

There, I said it.

Final behavorial changes to Collapse and Reexpansion
Add RsR tests

Move KDTree builder to test file

RSR: Fix collapse overflow bug

RSR_test.cpp: Remove duplicates testing

Since I have figured out that duplicates are checked in multiple places in the original reconstruction, explicitly removing duplicates have become unnecessary. In addition, I didn't like this oracle testing method either.

Collapse API is in HMesh::RSR now.

Better reconstruct_debug

Add RsR tests to CMakeLists.txt

RsR_test.cpp: Changes

Remove unused create_rectangular_manifold

Minor fixes.

Pass old reconstruct_single the pre-generated normals.

Collapse: implement fundamentals for QEM

RsR_test.cpp

Comment outdated tests, adapt existing tests for the new signatures

RsR_test.cpp

Add rectangular tests and a bunch of other nonsense

RsR_test:
Split include_directories again

RsR_test:

Remove unnecessary RangeTools and SparseGraph tests

Update RsR_test to reflect the latest changes

Make RsR_test compile again
Add parallel-hashmap to CMakeLists.txt. See issue janba#102.

Add AssociativeContainers.h

Adds type definitions for types used in the parallel_hashmap.

See janba#102 for context. But due to the slow nature of the built in associative containers, I think that this is a highly beneficial addition.

Adds a mutable priority queue as an alternative to generational priority queues.

This class generalizes a priority queue by allowing key ordering relations to be stored indirectly

This means that keys can be removed arbitrarily and not just from the front. Existing keys can be updated by reinserting them to the queue.

Note that the memory footprint of this is substantially larger than an ordinary vector-based min/max heap. In addition, the internal queue is always kept sorted, meaning there is a potentially higher up-front cost to constructing this. So make sure you really need the mutability or really benefit from it when using this over a generational priority queue.

Add missing assertion import and size() and empty()

Add multimap and multiset to associative containers
See issue janba#107

Add random library
@cakarsubasi
Copy link
Contributor Author

Supplanted by #139

@cakarsubasi cakarsubasi deleted the feature/rsr-reexpand branch November 26, 2025 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant