Skip to content

cmg: Use multiple queues for routing #4

@phoevos

Description

@phoevos

Currently the project uses the simplest model possible for routing incoming requests, with the Gateway parsing them and pushing them into a queue while the Scheduler asynchronously picks up tasks from that same queue and forwards them to the appropriate model server. Even though the scheduler can process more than one tasks from the queue at a time using multiple threads, this design can be suboptimal when the queue is dominated by requests targeting a certain model server, blocking others that could be routed to a different server (that could be idle) in parallel.

One potential way to tackle the above would be to use multiple queues for routing. Given that the number of model servers is dynamic, using a different queue for each one can be tricky as there's a limit to how many tasks/threads should be scheduled in parallel, while it's also harder to reason about avoiding starvation in a scenario with a dynamically changing set of queues.

Focusing on a constant number of queues, we could implement a consistent hashing approach for allocating requests to a given queue. Even though that doesn't guarantee each model server will have its dedicated queue, it decreases the chance of unrelated requests blocking each other.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions