Skip to content

Conversation

@zachary-kent
Copy link

@zachary-kent zachary-kent commented Dec 2, 2024

This refactors a good amount of code, with changes including:

  • Eliminating most raw pointers where possible. In several instances, raw pointers were used as arrays; I replaced most of these with parlay::sequence.
  • General deduplication
  • Eliminating global variables where possible
  • I think there was a data race between the query and update thread; there were concurrent reads and writes to the history global, which is a log of the database indices following every update. Every query would read from the end of the log to fetch the most recent indices, and the update thread would append a new version to the end of the log after every transaction. Without any protection, this seems unsafe. I guarded these accesses behind a lock; additionally, now only the most recent version is shared between both readers and writers.
  • Encapsulation of most of the transaction processing logic into the TransactionHistory class.
  • Replacing all floats with doubles

I (think?) these changes did not break anything, comparing the results for the queries before and after my changes. However, I think there are some slight differences resulting from switching from single to double-precision floats. I also couldn't find any test infrastructure set up for testing the correctness of updates.

@syhlalala
Copy link
Collaborator

Hi there! I'm not sure if I fully understand the issues about concurrency: In general, the design of the data structure is to allow each write/read transaction to take a snapshot of the current data structure, without using locks. The current design does not allow multiple threads to update the data structure at the same time. The write transaction will take the snapshot, and all changes will be made locally without modifying the shared data structure. After the entire transaction finishes, the new version will be committed and the readers will see.

In that case, the readers will always get a snapshot of the data structure, which stays unchanged during the entire transaction. The design is to avoid using locks to handle concurrent reads and writes. However, as I mentioned, multiple write operations cannot work concurrently. If that's the case, the one starting later would need to abort and restart.

The code itself actually implemented an even simpler setting, where there is only one reader (can run in parallel) and one writer (can also run in parallel).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants