Implement adaptors on hash tables

Hash sets/maps are common collections, and their current implementation in the standard library naturally fit Paralight's model, as they are represented by a slice of buckets. Conceptually, implementing a parallel source over hash tables could be as simple as `buckets.par_iter().filter(/* is bucket occupied? */)`.

The main challenge is that hash table internals are not exposed in public APIs (unlike `Vec` and its [`into_raw_parts()`](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.into_raw_parts)/[`from_raw_parts()`](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.from_raw_parts) APIs), which allows future evolvability of the stdlib implementation but prevents optimizations.

Another possibility would be to work with the [`hashbrown` crate](https://docs.rs/hashbrown/) (the underlying implementation of the standard library) to peek into the internals and provide `ParallelSource` adaptors for [`hashbrown::HashMap`](https://docs.rs/hashbrown/latest/hashbrown/struct.HashMap.html). This wouldn't fully solve it for [`std::collections::HashMap`](https://doc.rust-lang.org/std/collections/struct.HashMap.html) though.

Pushing to the frontier of what's possible, a terrible hack would be to use `unsafe` code to convert between `std::hashmap::HashMap` and `hashbrown::HashMap`, using something like [`rustversion`](https://docs.rs/rustversion/) or the upcoming [`cfg(version(...))`](https://github.com/rust-lang/rust/issues/64796) to allow this only on Rust versions where the standard library implementation is known to be a specific version of `hashbrown`. This is a whole can of worms, might be implemented via separate support crates, etc.

At the end of the day, there is no free lunch: the standard library isn't committing too much to its internals to allow future performance improvements, but this prevents crates from making assumptions on standard library types to implement some optimizations.

---

The other side of the puzzle is parallel collection into a hash table (akin to #4). One approach would be to have per-thread temporary tables to collect into, and a merge phase at the end. Another approach would be to integrate further into the internals, having a single hash table for all threads and locking selectively part of it when reading/writing into a bucket.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement adaptors on hash tables #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implement adaptors on hash tables #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions