Understanding action run orchestration
In Tines, it's important to consider the issue of fairness regarding action runs. How do you ensure a single story doesn't monopolize all of the compute power for action runs on a stack, thus stunting other stories? This question applies to single and multi-tenant stacks, where all of the stories share action run computing capacity.
To address this issue, we have a concept called "fair orchestration." "Fair orchestration" guarantees that each story on a stack receives a fair amount of worker time to process action runs.
How does it work?
The system uses two mechanisms to ensure fair resource distribution: concurrent run limits and token buckets. These are checked in sequence for every action run.
Concurrent run limits
The system allocates a percentage of total Sidekiq workers for concurrent action runs per story:
Multi-tenant stacks: 30% of total workers
Single-tenant stacks: 50% of total workers
This allocation prevents any single story from monopolizing the available workers. The system tracks the number of currently running actions for each story in Redis, and only allows new runs if they won't exceed these limits.
Token bucket system
If concurrent capacity is available, the system then checks token availability. Each story has a token bucket that refills proportionally based on the time elapsed since its last update, up to a maximum capacity of 1.5 million tokens. These 1.5 million tokens represent 25 minutes worth of worker time per minute. For high-priority stories in single-tenant environments, this limit is doubled to 3 million tokens (50 minutes of worker time).
Note |
|
As actions in a story execute, tokens are deducted from the story’s token bucket. From the time an action run starts, every second 1,000 tokens are deducted from the story's bucket until the action run is complete. This token consumption is tracked through a dedicated background thread to ensure accurate accounting of active runs.
If a story depletes its tokens, new action runs for that story will be pending until sufficient tokens have accumulated through the time-based refill system.
For example:
Let's say a system has 100 total Sidekiq workers:
For multi-tenant stacks: 30% allocation = 30 maximum concurrent workers
For non-multi-tenant stacks: 50% allocation = 50 maximum concurrent workers
A story starts with 1.5 million tokens and attempts to run 8 actions simultaneously:
1. First, the concurrent capacity check:
System checks if adding another run would exceed the worker allocation (30% or 50% of total workers)
Only actions within this concurrent limit will be allowed to start
The remaining actions wait in a ‘pending’ state until there is capacity (either concurrent action usage falls below the limit, or sufficient tokens are available) wait for capacity
2. Then, for the running actions:
Each action running for 10 seconds consumes 10,000 tokens (1,000 tokens/second × 10 seconds)
Total consumption is 50,000 tokens (5 actions × 10,000 tokens)
The story still has 1.45 million tokens available
3. When any of the running actions complete:
Concurrent capacity becomes available
The system checks token availability (1.45 million remaining)
If tokens are available, the next waiting action can start
Autoscaling to keep up with the demand
We ensure that our workers can auto-scale to match story run demand as needed in the Cloud. This is based on the percentage of workers currently available to process action runs on a stack. If the system notices that the percentage of workers available is less than the defined set of thresholds, workers are automatically scaled up. We do this preemptively to ensure action runs are not queued for too long.
Implementing fair orchestration in our action run logic ensures all stories receive an equitable share of compute time, and to ensure action runs are processed consistently with balanced worker allocation across the stack. Our goal here is to guarantee your action runs are getting enqueued and started as swiftly as possible, always.