Skip to content

Entities become unresponsive under load #339

@ghost

Description

We are using durable functions in Azure with Netherite and an elastic premium plan (EP2).
We are using a setup with only entity functions and no orchestrators. Each entity has a list of work items it needs to process and an operation to trigger the processing of one task in the list. If the list is not empty after the operation has finished, the entity signals itself to run the operation again.

Pseudocode:

class Worker {
  private List<WorkItem> workItems

  public AddWork(items) {
   workItems.append(items)
  }

  public Calculate {
    if(!workItems.Empty) {
      var wo = workItems.dequeue()
      doWork(wo) // side effect: write to db
      if(!workItems.Empty) {
        ctx.Signal(myself, "Calculate")
   }
}

The workers are created and initially signaled by another function:

Pseudocode

for(i = 1 to n) {
 client.Signal(worker+i, "AddWork", getWorkForI(i))
 client.Signal(worker+i, "Calculate")
}

The problem we are facing is that this setup runs ok for some time and then entities start becoming "stuck" somehow (they aren't doing the calculations) and a query to ListEntitiesAsync times out. The only method to revive the durable functions is to restart the durable function in Azure. We see some storage exceptions in the logs, but nothing really meaningful (to us). We don't see this problem without netherite (although it should be noted that we don't have the exact same system deployed with durable functions backed by Azure storage).

Is there a good way to debug these kind of problems when the durable runtime becomes unresponsive, or does someone see an obvious problem with the setup we are using?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions