Skip to content

Conversation

@thorstenhater
Copy link
Contributor

@thorstenhater thorstenhater commented Feb 10, 2025

  • Recognize that sampler handling is a simulation-level method, which is not callable by multiple threads.
  • Simplify probe_map
  • Specialize membership predicates
  • Clean-up a few things.

Result: @jlubo's test case used to use 20s to add 1600 sampler; now it's 1.2s.

@halfflat
Copy link
Contributor

Just for context, the locking was put in place so that samplers could be changed dynamically in response to observed samples, and so in fact could be called by multiple threads.

Not sure how best to enforce that sampler callbacks (or some other thread client side that is acting independently) can't change sampler–probe binding. How much do the locks themselves (as opposed to the other optimizations) slow things down? Keeping them is the easy option for safety and also for this functionality. I guess an alternative would be to have a check in the simulation method to e.g. throw if run() is underway in another thread.

An alternative to keep the functionality would be to use that steering interface we mooted way back and put these requests on a queue which is consumed by the simulation object between epochs, but that's of course a big deal.

For applying many sampler/probe bindings at once but keeping locks, how about a bulk interface?

@thorstenhater
Copy link
Contributor Author

Just for context, the locking was put in place so that samplers could be changed dynamically in response to observed samples, and so in fact could be called by multiple threads.

Interesting, I didn't have that background. Given the actual problem reported by @jlubo and the payoff I found, I'd be willing to accept the degradation in functionality, i.e. loss of dynamic sampling. To my knowledge there is neither an active use case nor is it tested or documented. For the actual simulation, it adds 4,000,000 samplers (!) in total, so any speed-up is critical.

How much do the locks themselves (as opposed to the other optimizations) slow things down?

Removing locks is a small factor in speed-up for adding sampler, but will also speed-up the simulation time as locking is needed when pulling samplers from the map, too.

As for the for the suggestions, I like them, but given time / personnel constraints, let's reserve those for an actual use case. Although, I suspect a concurrent hash map (or using a straight-up vector) might also an option.

@halfflat
Copy link
Contributor

I always hate to see functionality go away, even if no one is using it :D Nonetheless, in this case it isn't a code maintenance concern but a performance and safety concern — if we're going to remove a capability for performance, it would be good to quantify it to see what cost we are paying. I'm hoping, but haven't got benchmarks, that the integration dominates the run-time of the cell-group advance calls, but in the per-epoch overhead, I don't have a feel for how sampler set up time compares to sampled data wrangling and callbacks.

If we leave the locks out, we should guard access to the sampler manipulation methods in the simulator object (i.e. make sure they can't be called while simulator is running), and also document that this is forbidden; it makes things a bit more complicated.

@thorstenhater
Copy link
Contributor Author

It's more hypothetical functionality, especially as it isn't available from Python. I'll check the numbers again, in particular how they compare against the other optimisations (specialising the common filters like one_probe is the big one).
I'll put a simple atomic for checking if the simulation is running, if that's worth it.

@thorstenhater
Copy link
Contributor Author

As for the dominant fraction in profiling: I am often seeing -- in benchmarks, thus biased towards shorter times -- 1:1 splits initialisation ./. simulation. Label resolution and connection sorting being major cost centres. For the first one: #2447. For the latter I am brewing something.

The use case that inspired this particular overhaul: @jlubo dumps all the state at all the places at the simulation's end. As that includes all synapses, $\mathcal{O}(n^2)$ samplers with a trivial schedule are set.

@thorstenhater thorstenhater marked this pull request as ready for review August 19, 2025 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants