Discussion around scaling AI processes

Right now, most of the exApps are AI providers. They are consuming TaskProcessing tasks from Nextcloud. If the instance is big, a lot of tasks are likely to be scheduled. A basic way to scale those processes would be to implement exApps so they consume multiple tasks in parallel but then the implementations are more complex and there is no harmony in the scaling strategy between apps, we repeat the effort in each app. Also, this type of scaling is limited to the hardware capacity of the host where the exApp runs.

I want to start a discussion about the ideas we had in the Integrations/AI team to get feedback and maybe suggestions for alternative ways to scale the AI processes.

I'll try to stay short but not miss any important piece of context needed for this discussion. You can skip reading any paragraph if you feel like you know about it already.

### Context

#### Task processing API

Nextcloud implements an internal (Php) API called the Task Processing API. The purpose of this API is to have a standardized and abstract canvas in which Nextcloud can schedule and process tasks (AI-related or not, but mostly AI-related 😁). The 3 core concepts of this API are the task types, the tasks and the providers.

A task type defines a feature (like generate images, speech-to-text, AI-chat etc...). Task type definitions contain the shape of the input/output (what is expected as input and output). For example, "generate images" expects one text input (the user prompt), one number input (the number of images to generate) and produces one image list ouput (the generated images).

There are some hardcoded task types in the server code and apps can dynamically register task types.

A task is the combination of:
* A specific task type
* A set of inputs
* A status (scheduled, pending, successful, failed etc...)
* Some meta information (author, scheduling date etc...)
* Eventually a set of outputs

A provider is able to process tasks of a specific task type. A provider can be implemented in Php Nextcloud apps and in exApps. Those 2 types of providers consume tasks quite differently.

* Php providers can only run in Nextcloud background jobs (as there is no persistent Nextcloud process). When the `OC\TaskProcessing\SynchronousBackgroundJob` runs, it consumes one task which means it calls the "process" method of the provider with the task's input as parameters. In short, it runs one task. Background jobs can run each time cron.php is launched (usually with a system cron job). A more efficient way to run this job is to launch the `occ background-job:worker` command that is a sort of persistent process that tries to consume tasks as soon as they are scheduled.
* ExApps are different, as they are persistent processes, they can "poll" the Nextcloud server to consume tasks. There are optimizations to this polling system currently being implemented but this is out of scope. So an exApp (the provider in the exApp) gets tasks with the Nextcloud network API and sets the results back with the network API as well.

Important point to remember: Most exApp providers are stateless. They could theoretically run multiple times in parallel so multiple tasks are processed in parallel. AppAPI is currently working in such a way that an exApp is only one container, one instance of the app. That's mostly what we want to scale.

#### App API and exApps

AppAPI is a Nextcloud Php app that implements an API and an orchestrating system allowing to run a new type of Nextcloud apps: the exApps.
ExApps are sorts of microservices. They can be written in any language and should be packaged as a Docker image. A running exApp is a tiny webserver that Nextcloud can send requests to. ExApps can also send OCS requests to Nextcloud. Installing an exApp means that AppAPI will ask the underlying configured container system (currently Docker or Podman) to retrieve the image of the app, spawn a container, launch the main process of the container and initialize the app.

* Most of the exApp AI providers are stateless. They might contain some models and some cache data but no user-related information. They are a kind of worker that can consume tasks.
* Some exApp providers, specifically context chat for now, are statefull. Context chat contains the vector database with all the indexed content. Thankfully it is possible to use an external vector database to turn context chat into a stateless app.

Some work will be done soon to add support for Kubernetes as a container orchestrator (like we have for Docker and Podman).

#### Scaling Php provider apps

This part is easy because we dodge the problem here. All the Php provider apps don't actually run the AI processes themselves but connect to an external API (centralized services like OpenAI or self-hosted solutions like Ollama or LocalAI). We put the responsibility of scaling on the service behind the API we reach. So there is no need to scale the Php providers themselves.

#### What we want to scale

We want to scale 2 things:

* The consumption of tasks by exApp providers (remember each provider is responsible for one specific task type)
* Context chat is a special case, not only does it consumes tasks (to answer user's questions about their data) but it also indexes the content. This is currently done with requests from Nextcloud to the context chat exApp. We are planning to invert that so the exApp would make requests to Nextcloud to consume content to index. (💡 maybe this could be done with the task processing API instead of custom requests)

Context chat (stateful) that we want to make almost stateless

### Strategy we have in mind so far

Reminders:
* We can consider that we only have stateless providers that want to consume tasks (or indexing jobs but let's assimilate those as tasks)
* ExApps are polling for tasks. They are the ones making requests to NC to consume tasks
* AppAPI will soon support Kubernetes

There is a plugin/extension/addon/whatever in Kubernetes called KEDA (Kubernetes Event-driven Autoscaling). It allows to define auto-scaling strategies depending on the size of an external queue.

We could have a KEDA-compliant endpoint in Nextcloud so that there would be one KEDA strategy per task type (per exApp). Kubernetes would access one queue per task type and would manage the number of workers (containers of this specific exApp provider).

The special case of context chat indexing could be handled just like the classic task processing case: The queue is the indexing queue (the list of things to index). The created containers would be context chat ones launched in an "indexing-only" mode.

### Alternatives

A simple alternative is to create an exApp container each time a task is scheduled in Nextcloud. Once the container is done with processing the task, it could die.

The problem with this is the overhead of creating the container. It might also trigger the creation of way too many containers.

### Request for feedback

Mainly 3 questions:
* Does the Kubernetes+KEDA strategy make sense?
* Are there simple alternatives if we stick with Kubernetes?
* Are there simple alternatives that are not using Kubernetes?

cc @nextcloud/integration 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discussion around scaling AI processes #693

Context

Task processing API

App API and exApps

Scaling Php provider apps

What we want to scale

Strategy we have in mind so far

Alternatives

Request for feedback

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discussion around scaling AI processes #693

Description

Context

Task processing API

App API and exApps

Scaling Php provider apps

What we want to scale

Strategy we have in mind so far

Alternatives

Request for feedback

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions