-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Currently, the Model Gateway relies on the proxy service provided by CMS for supporting model discovery and dynamic routing in airgapped environments.
Why is a proxy needed in the first place?
When running in airgapped environments like the GAEs, a proxy server is usually configured to handle all traffic deemed to be targeting resources outside of the local network. This also affects containerised applications using Docker service names for communication. For these to work, we explicitly add the names of the expected targets to the containers' no_proxy list. However, this can only work when we know the names of all potential targets in advance (e.g. a backend server that only talks to a database and an object store). In the case of the Model Gateway, which allows the dynamic deployment of user-defined model servers, the list of targets can change dynamically. For that reason, instead of keeping track of all servers and attempting to maintain an up-to-date no_proxy list, we choose to proxy all requests through a single service, the name of which we can safely add to the CMG containers' no_proxy list.
What about the CMS-provided proxy?
The CMS project provides an nginx service meant to be used as a Gateway for the CMS stack (including model servers, monitoring, logging, and model tracking) responsible for handling TLS termination. When first encountering the issue with airgapped deployments we were looking for a quick fix, hence deciding to use the existing proxy for our purpose. The reality, however, is that the complexity of the CMS proxy isn't actually needed here. We don't need it to serve as a gateway, but rather as a glorified internal routing hack. For that reason, we don't need to be constrained by security concerns that might be pertinent to the CMS use case, nor do we need to rely on an external component that we don't directly control.
Introducing a CMG proxy
When thinking about introducing a proxy under the CMG proxy, our main concern would be simplicity and ease of configuration. One of the issues we've had with the nginx proxy was how hard and error-prone configuring it for dynamic routing was, leading to wasted development time for the first iteration, bugs, and hesitation to update it. In fact, dynamic routing through nginx, which is most commonly used alongside a list of explicitly configured upstream servers, always felt like a hack (we are abusing URL pattern matching with target reconstruction and relying on the DNS resolver for determining the final targets). What's more, it doesn't help with service discovery, which we perform directly in the Gateway. For that reason, we should look into more modern and flexible alternatives like Traefik, which relies on labels to identify services (exactly like we do internally in CMG): https://doc.traefik.io/traefik/v3.3/providers/docker/