-
Notifications
You must be signed in to change notification settings - Fork 5
Use a distribute_nodes function
#154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Hmm. Given that this is non-stationary and we have little information at the start (who knows if old estimators are any good?), it feels like trying to split in advance might be the wrong call. Since we only do this among worker processes on a single machine, what if we instead started with round-robin allocation, and then occasionally rebalanced via |
|
I think we want the initial distribution to still be via
where a global rebalance is re-solving the |
|
Disagree.
I'm imagining a hub-and-spoke design here, where each worker occasionally sends the 'hub' process a It also has the substantial benefit that there's a straightforward path to extend this from balancing processes on a single host, to working via the database for an entire fleet 🙂 Footnotes
|
not necessarily; each worker can check in with the hub (by reading from a shared
the runtime mechanism might take a while to balance, and we have the estimators right there! might as well initialize a gradient descent with a good guess I'm in agreement with the hub and spokes design, but I'm not convinced about "pick 3 and redistribute among those 3", when we could do "pick all n and redistribute among those n". (and I now think the hub should just continuously rebalance on an interval, rather than waiting for some condition) |
synthesis: we start each worker with an empty set of tests, and whenever they have zero runnable tests (e.g. startup, or after finding lots of failures) they ask the manager for a new set.
Fair enough; I was probably leaning too heavily on "power of two random choices" and thinking about the multi-node future version - but "just works well for a single node" is the better goal for now. |
|
haven't quite filled in all the estimator details yet, but this structure is what I was thinking of |
|
I basically went all the way to just doing the full bayes thing here, incorporating startup costs in our estimators, etc. I'm going to split off a PR from this for just the hub and worker structure with no fancy estimators, and then keep this as a dependent (-ing?) PR which adds the estimators. |
| ( | ||
| _targets(("a", (1, 1), 0), ("b", (2, 2), 0), ("c", (3, 3), 0)), | ||
| 2, | ||
| {("a", "c"), ("b",)}, | ||
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the greedy solution seems pretty bad here - don't we want (a, b), (c,)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(a, b), (c) would be ideal yeah. This particular case will be helped by iterating highest -> lowest. In general I'm not too worried about suboptimal greedy solutions since I would expect the rebalancing to fix things up eventually. (though if the rebalancing has failure modes in common configurations then that's bad of course, and I'm glad you're pointing this out).
This doesn't actually incorporate estimator state right now. It just hardcodes an estimator of
1.0for all nodes. But the groundwork and tests is all here for when we do plug in an estimator. (we weren't using estimator state before, so this doesn't change the status quo).